Re: documentCache not used in 4.3.1?

2013-07-01 Thread Daniel Collins
We see similar results, again we softCommit every 1s (trying to get as NRT as we can), and we very rarely get any hits in our caches. As an unscheduled test last week, we did shutdown indexing and noticed about 80% hit rate in caches (and average query time dropped from ~1s to 100ms!) so I think

Re: dataconfig to index ZIP Files

2013-07-01 Thread Bernd Fehling
Try setting dataSource=null for your toplevel entity and use filename=\.zip$ as filename selector. Am 28.06.2013 23:14, schrieb ericrs22: unfortunately not. I had tried that before with the logs saying: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:

Index pdf files.

2013-07-01 Thread archit2112
Hi I'm new to Solr. I want to index pdf files usng the Data Import Handler. Im using Solr-4.3.0. I followed the steps given in this post http://lucene.472066.n3.nabble.com/indexing-with-DIH-and-with-problems-td3731129.html However, I get the following error - Full Import

Re: Index pdf files.

2013-07-01 Thread Shalin Shekhar Mangar
The tika jars are not in your classpath. You need to add all the jars inside contrib/extraction/lib directory to your classpath. On Mon, Jul 1, 2013 at 2:00 PM, archit2112 archit2...@gmail.com wrote: Hi I'm new to Solr. I want to index pdf files usng the Data Import Handler. Im using

Re: Stemming query in Solr

2013-07-01 Thread snkar
Hi Erick, Thanks for the reply. Here is what the situation is: Relevant portion of Solr Schema: lt;field name=Content type=text_general indexed=false stored=true required=true/gt; lt;field name=ContentSearch type=text_general indexed=true stored=false multiValued=true/gt; lt;field

Set spellcheck field on query time?

2013-07-01 Thread Timo Schmidt
Hello together, we are currently working on a mutilanguage single core setup. During that I stumbled upon the question if it is possible to define different sources for the spellcheck. For now I only see the possibility to define different request handlers. Is it somehow possible to set the

Sum as a Projection for Facet Queries

2013-07-01 Thread samarth s
Hi, We have a need of finding the sum of a field for each facet.query. We have looked at StatsComponent http://wiki.apache.org/solr/StatsComponent but that supports only facet.field. Has anyone written a patch over StatsComponent that supports the same along with some performance measures? Is

Multiple groups of boolean queries in a single query.

2013-07-01 Thread samabhiK
Hello friends, I have a schema which contains various types of records of three different categories for ease of management and for making a single query to fetch all the data. The fields are grouped into three different types of records. For example: fields type 1: field name=x_date type=tdate

Re: Multiple groups of boolean queries in a single query.

2013-07-01 Thread samabhiK
My entire concern is to be able to make a single query to fetch all the types of records. If I had to create three different cores for this different types of data, I would have to make 3 calls to solr to fetch the entire set of data. And I will be having approx 15 such types in real. Also, at

Re: Index pdf files.

2013-07-01 Thread archit2112
Hi Thanks a lot. I did what you said. Now I'm getting the following error. Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0 -- View this message in context:

Re: Set spellcheck field on query time?

2013-07-01 Thread Jan Høydahl
Check out http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.dictionary - you can define multiple dictionaries in the same handler, each with its own source field. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 1. juli 2013 kl. 11:34 skrev Timo Schmidt

Re: documentCache not used in 4.3.1?

2013-07-01 Thread Erick Erickson
Daniel: Soft commits invalidate the top level caches, which include things like filterCache, queryResultCache etc. Various segment-level caches are NOT invalidated, but you really don't have a lot of control from the Solr level over those anyway. But yeah, the tension between caching a bunch of

Re: Index pdf files.

2013-07-01 Thread Erick Erickson
OK, have you done anything custom? You get this where? solr logs? Echoed back in the browser? In response to what command? You haven't provided enough info to help us help you. You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Jul 1, 2013 at 6:08 AM, archit2112

Re: Index pdf files.

2013-07-01 Thread archit2112
I figured it out. It was a problem with the regular expression i used in data-config.xml . -- View this message in context: http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278p4074304.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stemming query in Solr

2013-07-01 Thread Erick Erickson
bq: But looks like it is executing the search for an exact text based match with the stem burn. Right. You need to appreciate index time as opposed to query time stemming. Your field definition has both turned on. The admin/analysis page will help here G.. At index time, the terms are stemmed,

Re: Multiple groups of boolean queries in a single query.

2013-07-01 Thread Erick Erickson
Have you tried the query you indicated? Because it should just work barring syntax errors. The only other thing you might want is to turn on grouping by field type. That'll return separate sections by type, say the top 3 (default 1) documents in each type. If you don't group, you have the

Shard tolerant partial results

2013-07-01 Thread Phil Hoy
Hi, When doing distributed searches with shards.tolerant set whilst the hosts for a slice are down and therefore the response is partial, how best that inferred as we would like to not cache the results upstream and perhaps inform the end user in some way. I am aware that shards.info could be

Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Hi Im trying to index pdf files in solr 4.3.0 using the data import handler. *My request handler - * requestHandler name=/dataimport1 class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config1.xml/str /lst /requestHandler *My

Re: Unique key error while indexing pdf files

2013-07-01 Thread Jack Krupansky
It all depends on your data model - tell us more about your data model. For example, how will users or applications query these documents and what will they expect to be able to do with the ID/key for the documents? How are you expecting to identify documents in your data model? -- Jack

Re: Unique key error while indexing pdf files

2013-07-01 Thread archit2112
Im new to solr. Im just trying to understand and explore various features offered by solr and their implementations. I would be very grateful if you could solve my problem with any example of your choice. I just want to learn how i can index pdf documents using data import handler. -- View this

Re: RemoveDuplicatesTokenFilterFactory to avoid import duplicate values in multivalued field

2013-07-01 Thread tuedel
Hey, i have tried to make use of the UniqFieldsUpdateProcessorFactory in order to achieve distinct values in multivalued fields. Example below: updateRequestProcessorChain name=uniq_fields processor class=org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory lst name=fields

Re: Unique key error while indexing pdf files

2013-07-01 Thread Jack Krupansky
It's really 100% up to you how you want to come up with the unique key values for your documents. What would you like them to be? Just use that. Anything (within reason) - anything goes. But it also comes back to your data model. You absolutely must come up with a data model for how you

Re: Stemming query in Solr

2013-07-01 Thread snkar
So the general solution is to index the field twice, once with stemming and once without in order to have the ability to do both stemmed and exact matches I am already indexing the text twice using the ContentSearch and ContentSearchStemming fields. But what this allows me is to return

Re: RemoveDuplicatesTokenFilterFactory to avoid import duplicate values in multivalued field

2013-07-01 Thread Jack Krupansky
Your stated problem seems to have nothing to do with the message subject line relating to RemoveDuplicatesTokenFilterFactory. Please start a new message thread unless you really are concerned with an issue related to RemoveDuplicatesTokenFilterFactory. This kind of thread hijacking is

Re: Stemming query in Solr

2013-07-01 Thread snkar
I was just wondering if another solution might work. If we are able to extract the stem of the input search term(maybe using a C# based stemmer, some open source implementation of the Porter algorithm) for cases where the stemming option is selected, and submit the query to solr as a multiple

Re: Shard tolerant partial results

2013-07-01 Thread Mark Miller
On Jul 1, 2013, at 6:56 AM, Phil Hoy p...@brightsolid.com wrote: Perhaps an http header could be added or another attribute added to the solr result node. I thought that was already done - I'm surprised that it's not. If that's really the case, please make a JIRA issue. - Mark

Distinct values in multivalued fields

2013-07-01 Thread tuedel
Hello everybody, i have tried to make use of the UniqFieldsUpdateProcessorFactory in order to achieve distinct values in multivalued fields. Example below: updateRequestProcessorChain name=uniq_fields processor class=org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory

Converting nested data model to solr schema

2013-07-01 Thread adfel70
Hi, I have the following data model: 1. Document (fields: doc_id, author, content) 2. Each Document has multiple attachment types. Each attachment type has multiple instances. And each attachment type may have different fields. for example: doc doc_id1/doc_id authorjohn/author

Re: Distinct values in multivalued fields

2013-07-01 Thread Upayavira
Have a look at the DedupUpdateProcessorFactory, which may help you. Although, I'm not sure if it works with multivalued fields. Upayavira On Mon, Jul 1, 2013, at 02:34 PM, tuedel wrote: Hello everybody, i have tried to make use of the UniqFieldsUpdateProcessorFactory in order to achieve

Re: Converting nested data model to solr schema

2013-07-01 Thread Jack Krupansky
Simply duplicate a subset of the fields that you want to query of the parent document on each child document and then you can directly query the child documents without any join. Yes, given the complexity of your data, a two-step query process may be necessary for some queries - do one query

Re: Distinct values in multivalued fields

2013-07-01 Thread Jack Krupansky
Unfortunately, update processors only see the new, fresh, incoming data, not any existing document data. This is a case where your best bet may be to read the document first and then merge your new value into the existing list of values. -- Jack Krupansky -Original Message- From:

How to re-index Solr get term frequency within documents

2013-07-01 Thread Tony Mullins
Hi, I am using Solr 4.3.0. If I change my solr's schema.xml then do I need to re-index my solr ? And if yes , how to ? My 2nd question is I need to find the frequency of term per document in all documents of search result. My field is field name=CommentX type=text_general stored=true

Re: ConcurrentUpdateSolrServer hanging

2013-07-01 Thread qungg
Hi, BlockUntilFinish block indefinitely sometimes. But if I send a commit from another thread to the instance, the concurrentUpdateServer unblock and send the rest of the documents and commit. So the squence look like this: 1. adding documents as usual... 2. finish adding documents... 3. block

Re: How to re-index Solr get term frequency within documents

2013-07-01 Thread Jack Krupansky
You can write any function query in the field list of the fl parameter. Sounds like you want termfreq: termfreq(field_arg,term) fl=id,a,b,c,termfreq(a,xyz) -- Jack Krupansky -Original Message- From: Tony Mullins Sent: Monday, July 01, 2013 10:47 AM To: solr-user@lucene.apache.org

Concurrent Modification Exception

2013-07-01 Thread adityab
Hi, I have recently upgraded from Solr 3.5 to 4.2.1. Also we have added spellcheck feature to our search query. During our performance testing we have observed that for every 2000 request, 1 request fails. The exception we observe in solr log are ConcurrentModificationException. Below is the

Re: documentCache not used in 4.3.1?

2013-07-01 Thread Daniel Collins
Regrettably, visibility is key for us :( Documents must be searchable as soon as they have been indexed (or as near as we can make it). Our old search system didn't do relevance sort, it was time-ordered (so it had a much simpler job) but it did have sub-second latency, and that is what is

Does solr cloud required passwordless ssh?

2013-07-01 Thread adfel70
Hi Does solr cloud on a cluster of servers require passwordless ssh to be configured between the servers? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-solr-cloud-required-passwordless-ssh-tp4074398.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
To answer the previous Post: I was not sure what datasource=binaryFile I took it from a PDF sample thinking that would help. after setting datasource=null I'm still gett the same errors... dataConfig dataSource type=BinFileDataSource user=svcSolr password=SomePassword / document

Re: cores sharing an instance

2013-07-01 Thread Roman Chyla
as for the second option: If you look inside SolrResourceLoader, you will notice that before a CoreContainer is created, a new class loader is also created line:111 this.classLoader = createClassLoader(null, parent); however, this parent object is always null, because it is called from:

Re: Does solr cloud required passwordless ssh?

2013-07-01 Thread Mark Miller
No, SolrCloud does not currently use ssh. - Mark On Jul 1, 2013, at 12:58 PM, adfel70 adfe...@gmail.com wrote: Hi Does solr cloud on a cluster of servers require passwordless ssh to be configured between the servers? -- View this message in context:

Re: How to re-index Solr get term frequency within documents

2013-07-01 Thread Tony Mullins
Thanks Jack , it worked. Could you please provide some info on how to re-index existing data in Solr, after changing the schema.xml ? Thanks, Tony On Mon, Jul 1, 2013 at 8:21 PM, Jack Krupansky j...@basetechnology.comwrote: You can write any function query in the field list of the fl

Re: dataconfig to index ZIP Files

2013-07-01 Thread Noble Paul നോബിള്‍ नोब्ळ्
IIRC Zip files are not supported On Mon, Jul 1, 2013 at 10:30 PM, ericrs22 ericr...@yahoo.com wrote: To answer the previous Post: I was not sure what datasource=binaryFile I took it from a PDF sample thinking that would help. after setting datasource=null I'm still gett the same errors...

Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
I'm using the Tika plugin to do so and according to http://tika.apache.org/0.5/formats.html it does *ZIP archive (application/zip) Tika uses Java's built-in Zip classes to parse ZIP files. Support for ZIP was added in Tika 0.2.* -- View this message in context:

are fields stored or unstored by default xml

2013-07-01 Thread Katie McCorkell
In schema.xml I know you can label a field as stored=false or stored=true, but if you say neither, which is it by default? Thank you Katie

Re: are fields stored or unstored by default xml

2013-07-01 Thread Otis Gospodnetic
Haven't tried it recently, but is that even legal? Just be explicit :) Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Mon, Jul 1, 2013 at 2:16 PM, Katie McCorkell katiemccork...@gmail.com wrote: In schema.xml I know you can

Re: How to re-index Solr get term frequency within documents

2013-07-01 Thread Otis Gospodnetic
If all your fields are stored, you can do it with http://search-lucene.com/?q=solrentityprocessor Otherwise, just reindex the same way you indexed in the first place. *Always* be ready to reindex from scratch. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring --

Re: FileDataSource vs JdbcDataSouce (speed) Solr 3.5

2013-07-01 Thread Mike L.
 Hey Ahmet / Solr User Group,      I tried using the built in UpdateCSV and it runs A LOT faster than a FileDataSource DIH as illustrated below. However, I am a bit confused about the numDocs/maxDoc values when doing an import this way. Here's my Get command against a Tab delimted file: (I

Re: Classic 4.2 master-slave replication not completing

2013-07-01 Thread Neal Ensor
is it conceivable that there's too much traffic, causing Solr to stall re-opening the searcher (thus releasing to the new index)? I'm grasping at straws, and this is beginning to bug me a lot. The traffic logs wouldn't seem to support this (apart from periodic health-check pings, the load is

Perf. difference when the solr core is 'current' or not 'current'

2013-07-01 Thread jchen2000
in Solr's admin statistics page, there is a 'current' flag indicating whether the core index reader is 'current' or not. According to some discussions in this mailing list a few months back, it wouldn't affect anything. But my observation is completely different. When the current flag was not

Re: FileDataSource vs JdbcDataSouce (speed) Solr 3.5

2013-07-01 Thread Shawn Heisey
On 7/1/2013 12:56 PM, Mike L. wrote: Hey Ahmet / Solr User Group, I tried using the built in UpdateCSV and it runs A LOT faster than a FileDataSource DIH as illustrated below. However, I am a bit confused about the numDocs/maxDoc values when doing an import this way. Here's my Get

Re: are fields stored or unstored by default xml

2013-07-01 Thread Jack Krupansky
stored and indexed both default to true. This is legal: field name=alpha type=string / This detail will be in Early Access Release #2 of my book on Friday. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Monday, July 01, 2013 2:21 PM To:

Re: are fields stored or unstored by default xml

2013-07-01 Thread Yonik Seeley
On Mon, Jul 1, 2013 at 3:50 PM, Jack Krupansky j...@basetechnology.com wrote: stored and indexed both default to true. This is legal: field name=alpha type=string / Actually, for fields I believe the defaults come from the fieldType. The fieldType defaults to true for both indexed and

Re: Classic 4.2 master-slave replication not completing

2013-07-01 Thread Shawn Heisey
On 7/1/2013 1:07 PM, Neal Ensor wrote: is it conceivable that there's too much traffic, causing Solr to stall re-opening the searcher (thus releasing to the new index)? I'm grasping at straws, and this is beginning to bug me a lot. The traffic logs wouldn't seem to support this (apart from

Re: are fields stored or unstored by default xml

2013-07-01 Thread Jack Krupansky
Correct - the field definitions inherit the attributes of the field type, and it is the field type that has the actual default values for indexed and stored (and other attributes.) -- Jack Krupansky -Original Message- From: Yonik Seeley Sent: Monday, July 01, 2013 3:56 PM To:

Re: How to re-index Solr get term frequency within documents

2013-07-01 Thread Jack Krupansky
Or, go with a commercial product that has a single-click Solr re-index capability, such as: 1. DataStax Enterprise - data is stored in Cassandra and reindexed into Solr from there. 2. LucidWorks Search - data sources are declared so that the package can automatically re-crawl the data

Using per-segment FieldCache or DocValues in custom component?

2013-07-01 Thread Michael Ryan
I have some custom code that uses the top-level FieldCache (e.g., FieldCache.DEFAULT.getLongs(reader, foobar, false)). I'd like to redesign this to use the per-segment FieldCaches so that re-opening a Searcher is fast(er). In most cases, I've got a docId and I want to get the value for a

Re: Improving performance to return 2000+ documents

2013-07-01 Thread Utkarsh Sengar
Thanks Erick/Jagdish. Just to give some background on my queries. 1. All my queries are unique. A query can be: ipod and ipod 8gb (but these are unique). These are about 1.2M in total. So, I assume setting a high queryResultCache, queryResultWindowSize and queryResultMaxDocsCached won't help.

Disable Document Id from being printed in the logs...

2013-07-01 Thread Niran Fajemisin
Hi all, I noticed that for Solr 4.2, when an internal call is made between two nodes Solr uses the list of matching document ids to fetch the document details. At this time, it prints out all matching document ids as a part of the query. Is there a way to suppress these log statements from

Re: full-import failed after 5 hours with Exception: ORA-01555: snapshot too old: rollback segment number with name too small ORA-22924: snapshot too old

2013-07-01 Thread Michael Della Bitta
I would say definitely investigate the performance of the query, but also since you're using CachedSqlEntityProcessor, you might want to back off on the transaction isolation to READ_COMMITTED, which I think is the lowest one that Oracle supports:

Re: Disable Document Id from being printed in the logs...

2013-07-01 Thread Shawn Heisey
On 7/1/2013 3:24 PM, Niran Fajemisin wrote: I noticed that for Solr 4.2, when an internal call is made between two nodes Solr uses the list of matching document ids to fetch the document details. At this time, it prints out all matching document ids as a part of the query. Is there a way to

Re: dataconfig to index ZIP Files

2013-07-01 Thread ericrs22
not sure if this will help any. Here's the verbose log INFO - 2013-07-01 23:17:08.632; org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration: tika-data-config.xml INFO - 2013-07-01 23:17:08.648; org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded

Re: Converting nested data model to solr schema

2013-07-01 Thread Mikhail Khludnev
On Mon, Jul 1, 2013 at 5:56 PM, adfel70 adfe...@gmail.com wrote: This requires me to override the solr document distribution mechanism. I fear that with this solution I may loose some of solr cloud's capabilities. It's not clear whether you aware of

Re: Schema design for parent child field

2013-07-01 Thread Mikhail Khludnev
from my experience deeply nested scopes is for SOLR-3076 almost only. On Sat, Jun 29, 2013 at 1:08 PM, Sperrink kevin.sperr...@lexisnexis.co.zawrote: Good day, I'm seeking some guidance on how best to represent the following data within a solr schema. I have a list of subjects which are