RE: Solr regex query help

2015-01-24 Thread Arumugam, Suresh
Hi Erick, Thanks for the response. I understood the reason for the regex match not working. The help that I am looking from this forum is as below. 1. All the example regex query are to match one term only, Is there a way in Solr to match multiple term? 2. How can

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Carl Roberts
Thanks Jack. On 1/24/15, 3:57 PM, Jack Krupansky wrote: Take a look at the RegexTransformer. Or,in some cases your may need to use the raw ScriptTransformer. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler -- Jack

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Carl Roberts
Yes - I am using DIH and I am reading the info from an XML file using the URL datasource, and I want to strip the cpe:/o and tokenize the data by (:) during import so I can then search it as I've described. So, my question is this: Is there any built in logic via a transformer class that

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
How are you currently importing data? -- Jack Krupansky On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts carl.roberts.zap...@gmail.com wrote: Sorry if I was not clear. What I am asking is this: How can I parse the data during import to tokenize it by (:) and strip the cpe:/o? On 1/24/15,

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
Or, maybe... he's using DIH and getting these values from an RDBMS database query and now wants to index them in Solr. Who knows! It might be simplest to transform the colons to spaces and use a normal text field. Although you could use a custom text field type that used a regex tokenizer which

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Alexandre Rafalovitch
You are using keywords here that seem to contradict with each other. Or your use case is not clear. Specifically, you are saying you are getting stuff from a (Solr?) query. So, the results are now outside of Solr. Then you are asking for help to strip stuff off it. Well, it's outside of Solr, do

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Carl Roberts
The unzipped XML that I am reading looks like this: nvd xmlns:scap-core=http://scap.nist.gov/schema/scap-core/0.1; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:patch=http://scap.nist.gov/schema/patch/0.1; xmlns:vuln=http://scap.nist.gov/schema/vulnerability/0.4;

How to index data from multiple data source

2015-01-24 Thread Yusniel Hidalgo
Dear Solr community, I am diving into Solr recently and I need help in the following usage scenery. I am working on a project for extract and search bibliographic metadata from PDF files. Firstly, my PDF files are processed to extract bibliographic metadata such as title, authors,

How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Carl Roberts
Hi, How can I parse the data in a field that is returned from a query? Basically, I have a multi-valued field that contains values such as these that are returned from a query: cpe:/o:freebsd:freebsd:1.1.5.1, cpe:/o:freebsd:freebsd:2.2.3,

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Carl Roberts
Via this rss-data-config.xml file and a class that I wrote (attached) to download and XML file from a ZIP URL: dataConfig dataSource type=ZIPURLDataSource connectionTimeout=15000 readTimeout=3/ document entity name=cve-2002 pk=id

solr replication vs. rsync

2015-01-24 Thread Dan Davis
When I polled the various projects already using Solr at my organization, I was greatly surprised that none of them were using Solr replication, because they had talked about replicating the data. But we are not Pinterest, and do not expect to be taking in changes one post at a time (at least the

Re: Facet Double Counting

2015-01-24 Thread Ahmet Arslan
Hi Harish, What happens when you purge deleted terms with 'solr/core/update?commit=trueexpungeDeletes=true' ahmet On Sunday, January 25, 2015 1:59 AM, harish singh harish.sing...@gmail.com wrote: Hi, I am noticing a strange behavior with solr facet searching: This is my facet query:

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Alexandre Rafalovitch
The main question then is whether the full cpe:/o:freebsd:freebsd:2.2.5 string needs to be stored in Solr. If the desire is to actually strip that prefix all together and never see it in Solr document, then Jack's suggestion is spot on. If it is to store as is but to index based on custom

Facet Double Counting

2015-01-24 Thread harish singh
Hi, I am noticing a strange behavior with solr facet searching: This is my facet query: - params: { - facet: true, - sort: startTimeISO desc, - debugQuery: true, - facet.mincount: 1, - facet.sort: count, - start: 0, - q: requestType:(*login* or

Re: How to index data from multiple data source

2015-01-24 Thread Alexandre Rafalovitch
You could use nested entities in DIH. So, if you store - for example - path to the PDF in the database, you could do a nested entity with TikaEntityProcessor to load the content. Just make sure the field names do not conflict. Regards, Alex. Sign up for my Solr resources newsletter at

Re: How to index data from multiple data source

2015-01-24 Thread Yusniel Hidalgo
Thanks Alex, indeed, the relative path to PDF document is stored in the database. I will try to use your approach. Regards, Yusniel Hidalgo - Mensaje original - De: Alexandre Rafalovitch arafa...@gmail.com Para: solr-user solr-user@lucene.apache.org Enviados: Sábado, 24 de Enero 2015

Re: How to inject custom response data after results have been sorted

2015-01-24 Thread Joel Bernstein
Another thing to consider... If you only need custom stats for the current result page then there is no need to keep stats for the full result set. In this case you could perform your custom collapse and generate the stats just for the current page. The ExpandComponent could be altered to do that

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Jack Krupansky
Take a look at the RegexTransformer. Or,in some cases your may need to use the raw ScriptTransformer. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler -- Jack Krupansky On Sat, Jan 24, 2015 at 3:49 PM, Carl Roberts

Re: Connection Reset Errors with Solr 4.4

2015-01-24 Thread Shalin Shekhar Mangar
You are probably running into https://issues.apache.org/jira/browse/SOLR-6931 On Sat, Jan 24, 2015 at 12:09 AM, Mike Drob mad...@cloudera.com wrote: I'm not sure what a reasonable workaround would be. Perhaps somebody else can brainstorm and make a suggestion, sorry. On Tue, Jan 20, 2015 at

Re: Solr regex query help

2015-01-24 Thread Erik Hatcher
If you make your field type string the regex may work as expected. But as others said, splitting into separate fields is likely more flexible. Erik On Jan 23, 2015, at 23:58, Arumugam, Suresh suresh.arumu...@emc.com wrote: Hi All, We have indexed the documents to Solr not able

Re: Solr regex query help

2015-01-24 Thread Jack Krupansky
When I first read your post I thought this example had something to do with pipe, but now I realize that ::PIPE:: is simply a symbolic representation of what we software people call a pipe, namely the vertical bar character used as a field separator. Usually, terms and tokens are all of the same

RE: Solr I/O increases over time

2015-01-24 Thread Toke Eskildsen
Daniel Cukier [danic...@gmail.com] wrote: The servers have around 4M documents and receive a constant flow of queries. When the solr server starts, it works fine. But after some time running, it starts to take longer respond to queries, and the server I/O goes crazy to 100%. Look at the New

Re: Solr I/O increases over time

2015-01-24 Thread Arcadius Ahouansou
On 23 January 2015 at 22:52, Daniel Cukier danic...@gmail.com wrote: I am running around eight solr servers (version 3.5) instances behind a Load Balancer. All servers are identical and the LB is weighted by number connections. The servers have around 4M documents and receive a constant flow

Re: SolrCloud Replicas fall into recovery mode right after update

2015-01-24 Thread Shalin Shekhar Mangar
What version of Solr are you using? What GC parameters are you using? Do you have GC logs enabled? Look at full GC times in those logs and see what's happening. This particular problem is usually because replicas cannot accept the rate of updates and they fall back to recovery state. You should

Re: How do you parse the data in a field that is returned from a query?

2015-01-24 Thread Carl Roberts
Sorry if I was not clear. What I am asking is this: How can I parse the data during import to tokenize it by (:) and strip the cpe:/o? On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote: You are using keywords here that seem to contradict with each other. Or your use case is not clear.