Re: integrating Accumulo with solr

2014-07-30 Thread Ali Nazemian
Sure, Thank you very much for your guide. I think I am not that kind of gunslinger and probably I will go for another NoSQL that can be integrated with solr/elastic search much easier:) Best regards. On Sun, Jul 27, 2014 at 5:02 PM, Jack Krupansky j...@basetechnology.com wrote: Right, and

Re: Bloom filter

2014-07-30 Thread jim ferenczi
Hi Per, First of all the BloomFilter implementation in Lucene is not exactly a bloom filter. It uses only one hash function and you cannot set the false positive ratio beforehand. ElasticSearch has its own bloom filter implementation (using guava like BloomFilter), you should take a look at their

Re: SolrCloud without NRT and indexing only on the master

2014-07-30 Thread Harald Kirsch
Thanks Erick, for the confirmation. You say traditional but the docs call it legacy. Not a native speaker I might misinterpret the meaning slightly but to me it conveys the notion of don't use this stuff if you don't have to. SolrCloud indexes to all nodes all the time, there's no real way

Re: Bloom filter

2014-07-30 Thread Per Steffensen
On 30/07/14 08:55, jim ferenczi wrote: Hi Per, First of all the BloomFilter implementation in Lucene is not exactly a bloom filter. It uses only one hash function and you cannot set the false positive ratio beforehand. ElasticSearch has its own bloom filter implementation (using guava like

Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand
Hello, I need to expose the search and highlighting capabilities over few tens of fields. The edismax's qf param makes it possible but the time performances for searching tens of words over tens of fields is problematic. I made a copyField (indexed, not stored) for these fields, which gives way

Re: Searching and highlighting ten's of fields

2014-07-30 Thread aurelien . mazoyer
Hello, Do you use classic highlighter or fast vector highlighter? Aurélien On 30.07.2014 09:36, Manuel Le Normand wrote: Hello, I need to expose the search and highlighting capabilities over few tens of fields. The edismax's qf param makes it possible but the time performances for

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
Hi Per, There's LUCENE-5675 which has added a new postings format for IDs. Trying it out in Solr is in my todo list but maybe you can get to it before me. https://issues.apache.org/jira/browse/LUCENE-5675 On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen st...@designware.dk wrote: On 30/07/14

SOLR Schema add constant prefix to field value

2014-07-30 Thread Eichstädt , Konrad
Dear Solr User Group, I need your help for configuration the solr schema properly. What I would do is: I have the following field within the schema: field name=url type=string indexed=true stored=true/ Now I would have the same field value with a constant prefix like: field name=wayback_url

Search result at next component

2014-07-30 Thread Lee Chunki
Hi, I am building a new component and it run a new query depend on previous query result. solrconfig.xml setting is like : arr name=components strquery/str strnewComponent/str strfacet/str strmlt/str strhighlight/str strstats/str

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
I opened https://issues.apache.org/jira/browse/SOLR-6301 On Wed, Jul 30, 2014 at 1:35 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Hi Per, There's LUCENE-5675 which has added a new postings format for IDs. Trying it out in Solr is in my todo list but maybe you can get to it

Re: SOLR Schema add constant prefix to field value

2014-07-30 Thread Alexandre Rafalovitch
On Wed, Jul 30, 2014 at 3:21 PM, Eichstädt, Konrad konrad.eichsta...@sbb.spk-berlin.de wrote: Now I would have the same field value with a constant prefix like: field name=wayback_url type=string indexed=false stored=true/ Your source value in the Clone URP is mis-spelt. So that might be part

Search result at next component

2014-07-30 Thread Lee Chunki
Hi, I am building a new component and it run a new query depend on previous query result. solrconfig.xml setting is like : arr name=components strquery/str strnewComponent/str strfacet/str strmlt/str strhighlight/str strstats/str

PeerSync: too many updates received since start - startingUpdates no longer overlaps with our currentUpdates

2014-07-30 Thread 汤林
Hi, All. I met one issue when sending lots of docs to a 2-nodes SolrCloud. My env has one collection with 2 nodes. The only collection has 2 shards with 2 replica of each shard. We are using Solr 4.7. We found this warning when we are sending docs to the SolrCloud. And we noticed one

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand
Current I use the classic but I can change my posting format in order to work with another highlighting component if that leads to any solution

Re: SolrCloud without NRT and indexing only on the master

2014-07-30 Thread Daniel Collins
Working backwards slightly, what do you think SolrCloud is going to give you, apart from the consistency of the index (which you want to turn off)? What are all the other benefits of SolrCloud, if you are querying separate instances that aren't guaranteed to be in sync (since you want to use the

Query on Facet

2014-07-30 Thread Smitha Rajiv
Hi, I need some help on Solr Faceting. How do I facet on two fields at the same time to get combination facets and its count? I'm using below query to get facets with combination of language and its binding. But now I'm getting only selected facet in facetList of each field and its count.

Re: Query on Facet

2014-07-30 Thread Alexandre Rafalovitch
I am not sure I fully understood your question, but I would start by looking at Tagging and Excluding first: https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter:

Re: SolrCloud without NRT and indexing only on the master

2014-07-30 Thread Harald Kirsch
Hi Daniel, well, I assume there is a performance difference on host B between a) getting some ready-made segments from host A (master, taking care of indexing) to host B (slave, taking care of answering queries) and b) host B (along with host A) doing all the work necessary to prepare

Ranking based on match position in field

2014-07-30 Thread Thomas Michael Engelke
Hi, an example. We have 2 records with this data in the same field (description): 1: Lufthutze vor Kühler Bj 62-65, DS 2: Kühler HY im Austausch, Altteilpfand 250 Euro A search with the parameters 'description:Kühler' does provide this debug: 2.3234584 = (MATCH) weight(description:kühler in

Re: Bloom filter

2014-07-30 Thread Per Steffensen
Hi I am not sure exactly what LUCENE-5675 does, but reading the description it seems to me that it would help finding out that there is no document (having an id-field) where version-field is less than some-version. As far as I can see this will not help finding out if a document with

Re: Ranking based on match position in field

2014-07-30 Thread Ahmet Arslan
Hi, Please see : https://issues.apache.org/jira/browse/SOLR-3925 Ahmet On Wednesday, July 30, 2014 2:39 PM, Thomas Michael Engelke thomas.enge...@posteo.de wrote: Hi, an example. We have 2 records with this data in the same field (description): 1: Lufthutze vor Kühler Bj 62-65, DS 2:

Tika analyzers

2014-07-30 Thread Tommaso Teofili
Hi all, while SolrCell works nicely when in need of indexing binary documents, I am wondering about the possibility of having Lucene / Solr documents that have binaries in specific Lucene fields, e.g. title=a nice doc, nameblabla.doc, binary=0x1234 In that case the binary field should have

Search on Date Field

2014-07-30 Thread Pbbhoge
In my SOLR there is date field(published_date) and values are in this format 2012-09-26T10:08:09.123Z How I can search by simple input like 2012-09-10 instead of full ISO date format. Is it possible in SOLR? -- View this message in context:

Identify specific document insert error inside a solrj batch request

2014-07-30 Thread Liram Vardi
Hi All, I have a question regarding the use of HttpSolrServer (SolrJ). I have a collection of SolrInputDocuments I want to send to Solr as a batch. Now, let's assume that one of the docs inside this collection is corrupted (missing some required field). When I send the batch of docs to solr

Index size increase after upgrade to 4.9?

2014-07-30 Thread Shawn Heisey
Yesterday I upgraded my dev server to Solr 4.9, and also upgraded a third-party plugin to a new version that's compatible with Solr 4.9. After the index was rebuilt, each shard was 28GB ... but before the upgrade, each shard was only 20GB. The number of documents per shard (16.4 million)

Re: Query on Facet

2014-07-30 Thread vamshi kiran
Hi Alex, As you said If we exclude language facet field ,it will get all the language facets with count right ? It Will not filter by binding facet field of type 'paperback' , how can we do this ? Thanks Regards, Vamshi. On Jul 30, 2014 4:11 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

Re: Searching words with spaces for word without spaces in solr

2014-07-30 Thread sunshine glass
This is the new configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.ShingleFilterFactory

Re: Query on Facet

2014-07-30 Thread Sujit Pal
Hi Smitha, Have you looked at Facet queries? It allows you to attach Solr queries to facets. The problem with this is that you will need to know all possible combinations of language and binding (or make an initial query to find this information).

Re: Identify specific document insert error inside a solrj batch request

2014-07-30 Thread Jack Krupansky
Agreed that this is a problem with Solr. If it was merely bad input, Solr should be returning a 4xx error. I don't know if we already have a Jira for this. If not, one should be filed. There are two issues: 1. The status code should be 4xx with an appropriate message about bad input. 2.

Re: Searching words with spaces for word without spaces in solr

2014-07-30 Thread sunshine glass
This is the analysis page: ​​ ​ Please help me now. On Wed, Jul 30, 2014 at 8:08 PM, sunshine glass sunshineglassof2...@gmail.com wrote: This is the new configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Erick Erickson
Doesn't hl.fl work in this case? Or is highlighting the 10 fields the slowdown? Best, Erick On Wed, Jul 30, 2014 at 2:55 AM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Current I use the classic but I can change my posting format in order to work with another highlighting component

Re: SolrCloud without NRT and indexing only on the master

2014-07-30 Thread Erick Erickson
Sorry for the confusion between legacy and traditional, it's just sloppy terminology. There's no sense of don't use this with traditional M/R replication. In fact, when SolrCloud nodes need to catch up with their indexes if they're very out of sync, this is still used. So it's definitely

Re: Tika analyzers

2014-07-30 Thread Erick Erickson
Hmmm, might a custom update processor do that? In an update processor, you'd get the binary and be able to do anything at all you wanted to with that. I'm not quite clear on how the binary gets through the Tika bits and gets passed in in the first place, but Best, Erick On Wed, Jul 30, 2014

Re: Index size increase after upgrade to 4.9?

2014-07-30 Thread Erick Erickson
I assume you've optimized? Or otherwise insured that there aren't any deleted docs Best, Erick On Wed, Jul 30, 2014 at 6:27 AM, Shawn Heisey s...@elyograg.org wrote: Yesterday I upgraded my dev server to Solr 4.9, and also upgraded a third-party plugin to a new version that's compatible

Re: Index size increase after upgrade to 4.9?

2014-07-30 Thread Shawn Heisey
On 7/30/2014 9:10 AM, Erick Erickson wrote: I assume you've optimized? Or otherwise insured that there aren't any deleted docs It's all straight indexing with DIH from MySQL, so there really are no deleted docs, but about an hour after the rebuild finished, one of the shards did get

Re: Tika analyzers

2014-07-30 Thread Alexandre Rafalovitch
Solr effectively supports only one binary document that gets indexed. This is because you are not actually indexing the document. You are extracting metadata (e.g. Author) and content fields out of it and map it to the Solr document. So, it makes no sense to have two fields that are binary because

Re: Index size increase after upgrade to 4.9?

2014-07-30 Thread Shawn Heisey
On 7/30/2014 9:16 AM, Shawn Heisey wrote: On 7/30/2014 9:10 AM, Erick Erickson wrote: I assume you've optimized? Or otherwise insured that there aren't any deleted docs It's all straight indexing with DIH from MySQL, so there really are no deleted docs, but about an hour after the rebuild

Re: Index size increase after upgrade to 4.9?

2014-07-30 Thread Shawn Heisey
On 7/30/2014 10:00 AM, Shawn Heisey wrote: It may turn out that this is actually a bug in merging, where old segments are not getting deleted. I noticed in the optimized index that there is a single large segment of about 20GB and a bunch of other segments that are all older than the single

Re: Bloom filter

2014-07-30 Thread Shalin Shekhar Mangar
You're right. I misunderstood. I thought that you wanted to optimize the finding by id path which is typically done for comparing versions during inserts in Solr. Yes, it won't help with the case where the ID does not exist. On Wed, Jul 30, 2014 at 6:14 PM, Per Steffensen st...@designware.dk

Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Eugene
Hello, fellow Solr and Lucene users and developers! In our project we receive text from users in different languages. We detect language automatically and use Google Translate APIs a lot (so having arbitrary number of languages in our system doesn't concern us). However we need to be able

re: Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Chris Morley
I know BasisTech.com has a plugin for elasticsearch that extends stemming/lemmatization to work across 40 natural languages. I'm not sure what they have for Solr, but I think something like that may exist as well. Cheers, -Chris. From: Eugene

Exception : Processing of multipart/form-data request failed.

2014-07-30 Thread Ameya Aware
Hi I am getting exception for Processing of multipart/form-data request failed. My solrconfig.xml contains: requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=512 formdataUploadLimitInKB=2048

Re: Copy existing index from standalone Solr to Solr cloud

2014-07-30 Thread avgxm
Used the admin/collections?action=SPLITSHARD, to create shard1_0, shard1_1, and then followed this thread http://lucene.472066.n3.nabble.com/How-can-you-move-a-shard-from-one-SolrCloud-node-to-another-td4106815.html to move the shards to the right nodes. Problem solved. -- View this message in

Re: Implementing custom analyzer for multi-language stemming

2014-07-30 Thread Sujit Pal
Hi Eugene, In a system we built couple of years ago, we had a corpus of English and French mixed (and Spanish on the way but that was implemented by client after we handed off). We had different fields for each language. So (title, body) for English docs was (title_en, body_en), for French

Avoiding indexing of hidden folders and files

2014-07-30 Thread Ameya Aware
Hi, I noticed a fact that Solr indexes all the folders and files including hidden files. Can anyone help me with avoiding indexing of hidden files? Thanks, Ameya

Re: Avoiding indexing of hidden folders and files

2014-07-30 Thread Ahmet Arslan
Hi Ameya, You meant to post manifoldcf user mailing list? Or are you referring java -jar post.jar utility? Ahmet On Wednesday, July 30, 2014 11:15 PM, Ameya Aware ameya.aw...@gmail.com wrote: Hi, I noticed a fact that Solr indexes all the folders and files including hidden files. Can anyone

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Manuel Le Normand
The slowdown occurs during search, not highlighting. Having a disjunctive query with 50 terms running 20 different posting lists is a hard task. Harder than searching these 50 terms on a single (larger) posting list as in the copyField case. With the edismax qf param, sure, hl.fl=* works as it

Re: Searching and highlighting ten's of fields

2014-07-30 Thread Erick Erickson
bq: Is there a way to search the global copyField but highlight the original stored fields? That's what I was suggesting. Specify the global field for your search, but use hl.fl for fields you want to copy. And yes, storing the fields is required for highlighting. Consider stemming (or worse,

Re: Search result at next component

2014-07-30 Thread Ahmet Arslan
Hi Lee, You can use : final DocList docList = rb.getResults().docList; And if you want to access individual field values, use solrpluginutils' static method to obtain SolrDocumentList SolrDocumentList solrDocs = docListToSolrDocumentList(rb.getResults().docList, req.getSearcher(), fields);

Setting a Key/Tag/Label for each group.query Result Set

2014-07-30 Thread Carlos Maroto
Hi, I'm trying to get results in a single Solr call through multiple group.query definitions. I'm getting the results I want but, each group is presented under a name consisting of the query used for that group. I'd like to change the name of each group to some meaningful name instead. I'm

Index a time/date range

2014-07-30 Thread Ryan Cutter
Is there a way to index time or date ranges? That is, assume 2 docs: #1: date = 2014-01-01 #2: date = 2014-02-01 through 2014-05-01 Would there be a way to index #2's date as a single field and have all the search options you usually get with time/date? One strategy could be to index the start

Re: Index a time/date range

2014-07-30 Thread Alexandre Rafalovitch
For fancier versions, some people used geo coordinates to represent start on X axis and stop on Y. Then use perimeter bounds to do overlaps. There was a discussion on the list about that a while ago. Regards, Alex On 31/07/2014 6:26 am, Ryan Cutter ryancut...@gmail.com wrote: Is there a

Re: Index a time/date range

2014-07-30 Thread Jost Baron
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Ryan, On 07/31/2014 01:26 AM, Ryan Cutter wrote: Is there a way to index time or date ranges? That is, assume 2 docs: #1: date = 2014-01-01 #2: date = 2014-02-01 through 2014-05-01 Would there be a way to index #2's date as a single field

Re: Character encoding problems

2014-07-30 Thread Gulliver Smith
Thanks for all the replies - I should have made clear that the first thing I did was confirm that everything on the PHP side is UTF-8. The web pages, the input text, the input files etc. The browser confirms that the encoding is UTF-8 for all of the web pages, the response headers as inspected by

Re: Index a time/date range

2014-07-30 Thread david.w.smi...@gmail.com
The wiki page on the technique cleans up some small errors from Hoss’s presentation: http://wiki.apache.org/solr/SpatialForTimeDurations But please try Solr trunk which has first-class support for date durations: https://issues.apache.org/jira/browse/SOLR-6103 Soonish I’ll back-port to 4x. ~

Re: Search result at next component

2014-07-30 Thread Lee Chunki
Hi Ahmet, it’s working :) Thank you Chunki. On Jul 31, 2014, at 7:48 AM, Ahmet Arslan iori...@yahoo.com.INVALID wrote: Hi Lee, You can use : final DocList docList = rb.getResults().docList; And if you want to access individual field values, use solrpluginutils' static method to

Re: Query on Facet

2014-07-30 Thread Smitha Rajiv
Hi All, We have tried both exclude option as well as facet query. Both approach are not giving us the desired results. I will explain a little further. I have first level facets - Paperback and Ebook, and second level facets include a list of languages like English, French etc.. When user

Querying from solr shards

2014-07-30 Thread Smitha Rajiv
Hi All, Currently i am using solr legacy distributed configuration (not solr cloud, single solr server with multiple shards). I need to write a query to get one particular document (id specific) from one shard and all documents from other shards. Can you please help me to get this query right.

Re: Query on Facet

2014-07-30 Thread Alexandre Rafalovitch
Now it sounds like maybe you have nested facets as opposed to just different ones. See if one of these fits your use case better: http://wiki.apache.org/solr/HierarchicalFaceting Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: