Sure,
Thank you very much for your guide. I think I am not that kind of gunslinger
and probably I will go for another NoSQL that can be integrated with
solr/elastic search much easier:)
Best regards.
On Sun, Jul 27, 2014 at 5:02 PM, Jack Krupansky j...@basetechnology.com
wrote:
Right, and
Hi Per,
First of all the BloomFilter implementation in Lucene is not exactly a
bloom filter. It uses only one hash function and you cannot set the false
positive ratio beforehand. ElasticSearch has its own bloom filter
implementation (using guava like BloomFilter), you should take a look at
their
Thanks Erick,
for the confirmation.
You say traditional but the docs call it legacy. Not a native
speaker I might misinterpret the meaning slightly but to me it conveys
the notion of don't use this stuff if you don't have to.
SolrCloud indexes to all nodes all the time, there's no real way
On 30/07/14 08:55, jim ferenczi wrote:
Hi Per,
First of all the BloomFilter implementation in Lucene is not exactly a
bloom filter. It uses only one hash function and you cannot set the false
positive ratio beforehand. ElasticSearch has its own bloom filter
implementation (using guava like
Hello,
I need to expose the search and highlighting capabilities over few tens of
fields. The edismax's qf param makes it possible but the time performances
for searching tens of words over tens of fields is problematic.
I made a copyField (indexed, not stored) for these fields, which gives way
Hello,
Do you use classic highlighter or fast vector highlighter?
Aurélien
On 30.07.2014 09:36, Manuel Le Normand wrote:
Hello,
I need to expose the search and highlighting capabilities over few tens
of
fields. The edismax's qf param makes it possible but the time
performances
for
Hi Per,
There's LUCENE-5675 which has added a new postings format for IDs. Trying
it out in Solr is in my todo list but maybe you can get to it before me.
https://issues.apache.org/jira/browse/LUCENE-5675
On Wed, Jul 30, 2014 at 12:57 PM, Per Steffensen st...@designware.dk
wrote:
On 30/07/14
Dear Solr User Group,
I need your help for configuration the solr schema properly. What I would do is:
I have the following field within the schema:
field name=url type=string indexed=true stored=true/
Now I would have the same field value with a constant prefix like:
field name=wayback_url
Hi,
I am building a new component and it run a new query depend on previous query
result.
solrconfig.xml setting is like :
arr name=components
strquery/str
strnewComponent/str
strfacet/str
strmlt/str
strhighlight/str
strstats/str
I opened https://issues.apache.org/jira/browse/SOLR-6301
On Wed, Jul 30, 2014 at 1:35 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
Hi Per,
There's LUCENE-5675 which has added a new postings format for IDs. Trying
it out in Solr is in my todo list but maybe you can get to it
On Wed, Jul 30, 2014 at 3:21 PM, Eichstädt, Konrad
konrad.eichsta...@sbb.spk-berlin.de wrote:
Now I would have the same field value with a constant prefix like:
field name=wayback_url type=string indexed=false stored=true/
Your source value in the Clone URP is mis-spelt. So that might be part
Hi,
I am building a new component and it run a new query depend on previous query
result.
solrconfig.xml setting is like :
arr name=components
strquery/str
strnewComponent/str
strfacet/str
strmlt/str
strhighlight/str
strstats/str
Hi, All.
I met one issue when sending lots of docs to a 2-nodes SolrCloud. My env has
one collection with 2 nodes. The only collection has 2 shards with 2 replica of
each shard.
We are using Solr 4.7.
We found this warning when we are sending docs to the SolrCloud. And we noticed
one
Current I use the classic but I can change my posting format in order to
work with another highlighting component if that leads to any solution
Working backwards slightly, what do you think SolrCloud is going to give
you, apart from the consistency of the index (which you want to turn off)?
What are all the other benefits of SolrCloud, if you are querying
separate instances that aren't guaranteed to be in sync (since you want to
use the
Hi,
I need some help on Solr Faceting.
How do I facet on two fields at the same time to get combination facets and
its count?
I'm using below query to get facets with combination of language and its
binding. But now I'm getting only selected facet in facetList of each
field and its count.
I am not sure I fully understood your question, but I would start by
looking at Tagging and Excluding first:
https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter:
Hi Daniel,
well, I assume there is a performance difference on host B between
a) getting some ready-made segments from host A (master, taking care of
indexing) to host B (slave, taking care of answering queries)
and
b) host B (along with host A) doing all the work necessary to prepare
Hi,
an example. We have 2 records with this data in the same field
(description):
1: Lufthutze vor Kühler Bj 62-65, DS
2: Kühler HY im
Austausch, Altteilpfand 250 Euro
A search with the parameters
'description:Kühler' does provide this debug:
2.3234584 = (MATCH)
weight(description:kühler in
Hi
I am not sure exactly what LUCENE-5675 does, but reading the description
it seems to me that it would help finding out that there is no document
(having an id-field) where version-field is less than some-version. As
far as I can see this will not help finding out if a document with
Hi,
Please see : https://issues.apache.org/jira/browse/SOLR-3925
Ahmet
On Wednesday, July 30, 2014 2:39 PM, Thomas Michael Engelke
thomas.enge...@posteo.de wrote:
Hi,
an example. We have 2 records with this data in the same field
(description):
1: Lufthutze vor Kühler Bj 62-65, DS
2:
Hi all,
while SolrCell works nicely when in need of indexing binary documents, I am
wondering about the possibility of having Lucene / Solr documents that have
binaries in specific Lucene fields, e.g. title=a nice doc,
nameblabla.doc, binary=0x1234
In that case the binary field should have
In my SOLR there is date field(published_date) and values are in this format
2012-09-26T10:08:09.123Z
How I can search by simple input like 2012-09-10 instead of full ISO date
format.
Is it possible in SOLR?
--
View this message in context:
Hi All,
I have a question regarding the use of HttpSolrServer (SolrJ).
I have a collection of SolrInputDocuments I want to send to Solr as a batch.
Now, let's assume that one of the docs inside this collection is corrupted
(missing some required field).
When I send the batch of docs to solr
Yesterday I upgraded my dev server to Solr 4.9, and also upgraded a
third-party plugin to a new version that's compatible with Solr 4.9.
After the index was rebuilt, each shard was 28GB ... but before the
upgrade, each shard was only 20GB. The number of documents per shard
(16.4 million)
Hi Alex,
As you said If we exclude language facet field ,it will get all the
language facets with count right ?
It Will not filter by binding facet field of type 'paperback' , how can we
do this ?
Thanks Regards,
Vamshi.
On Jul 30, 2014 4:11 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:
This is the new configuration:
fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ShingleFilterFactory
Hi Smitha,
Have you looked at Facet queries? It allows you to attach Solr queries to
facets. The problem with this is that you will need to know all possible
combinations of language and binding (or make an initial query to find this
information).
Agreed that this is a problem with Solr. If it was merely bad input, Solr
should be returning a 4xx error.
I don't know if we already have a Jira for this. If not, one should be
filed.
There are two issues:
1. The status code should be 4xx with an appropriate message about bad
input.
2.
This is the analysis page:
Please help me now.
On Wed, Jul 30, 2014 at 8:08 PM, sunshine glass
sunshineglassof2...@gmail.com wrote:
This is the new configuration:
fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter
Doesn't hl.fl work in this case? Or is highlighting the 10 fields the
slowdown?
Best,
Erick
On Wed, Jul 30, 2014 at 2:55 AM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Current I use the classic but I can change my posting format in order to
work with another highlighting component
Sorry for the confusion between legacy and traditional, it's just
sloppy terminology. There's no sense of don't use this with traditional
M/R replication. In fact, when SolrCloud nodes need to catch up with their
indexes if they're very out of sync, this is still used. So it's definitely
Hmmm, might a custom update processor do that? In an update
processor, you'd get the binary and be able to do anything at all
you wanted to with that. I'm not quite clear on how the binary
gets through the Tika bits and gets passed in in the first place,
but
Best,
Erick
On Wed, Jul 30, 2014
I assume you've optimized? Or otherwise insured that there aren't
any deleted docs
Best,
Erick
On Wed, Jul 30, 2014 at 6:27 AM, Shawn Heisey s...@elyograg.org wrote:
Yesterday I upgraded my dev server to Solr 4.9, and also upgraded a
third-party plugin to a new version that's compatible
On 7/30/2014 9:10 AM, Erick Erickson wrote:
I assume you've optimized? Or otherwise insured that there aren't
any deleted docs
It's all straight indexing with DIH from MySQL, so there really are no
deleted docs, but about an hour after the rebuild finished, one of the
shards did get
Solr effectively supports only one binary document that gets indexed.
This is because you are not actually indexing the document. You are
extracting metadata (e.g. Author) and content fields out of it and map
it to the Solr document. So, it makes no sense to have two fields
that are binary because
On 7/30/2014 9:16 AM, Shawn Heisey wrote:
On 7/30/2014 9:10 AM, Erick Erickson wrote:
I assume you've optimized? Or otherwise insured that there aren't
any deleted docs
It's all straight indexing with DIH from MySQL, so there really are no
deleted docs, but about an hour after the rebuild
On 7/30/2014 10:00 AM, Shawn Heisey wrote:
It may turn out that this is actually a bug in merging, where old
segments are not getting deleted. I noticed in the optimized index that
there is a single large segment of about 20GB and a bunch of other
segments that are all older than the single
You're right. I misunderstood. I thought that you wanted to optimize the
finding by id path which is typically done for comparing versions during
inserts in Solr.
Yes, it won't help with the case where the ID does not exist.
On Wed, Jul 30, 2014 at 6:14 PM, Per Steffensen st...@designware.dk
Hello, fellow Solr and Lucene users and developers!
In our project we receive text from users in different languages. We
detect language automatically and use Google Translate APIs a lot (so
having arbitrary number of languages in our system doesn't concern us).
However we need to be able
I know BasisTech.com has a plugin for elasticsearch that extends
stemming/lemmatization to work across 40 natural languages.
I'm not sure what they have for Solr, but I think something like that may
exist as well.
Cheers,
-Chris.
From: Eugene
Hi
I am getting exception for Processing of multipart/form-data request
failed.
My solrconfig.xml contains:
requestParsers enableRemoteStreaming=true
multipartUploadLimitInKB=512
formdataUploadLimitInKB=2048
Used the admin/collections?action=SPLITSHARD, to create shard1_0, shard1_1,
and then followed this thread
http://lucene.472066.n3.nabble.com/How-can-you-move-a-shard-from-one-SolrCloud-node-to-another-td4106815.html
to move the shards to the right nodes. Problem solved.
--
View this message in
Hi Eugene,
In a system we built couple of years ago, we had a corpus of English and
French mixed (and Spanish on the way but that was implemented by client
after we handed off). We had different fields for each language. So (title,
body) for English docs was (title_en, body_en), for French
Hi,
I noticed a fact that Solr indexes all the folders and files including
hidden files.
Can anyone help me with avoiding indexing of hidden files?
Thanks,
Ameya
Hi Ameya,
You meant to post manifoldcf user mailing list?
Or are you referring java -jar post.jar utility?
Ahmet
On Wednesday, July 30, 2014 11:15 PM, Ameya Aware ameya.aw...@gmail.com wrote:
Hi,
I noticed a fact that Solr indexes all the folders and files including
hidden files.
Can anyone
The slowdown occurs during search, not highlighting. Having a disjunctive
query with 50 terms running 20 different posting lists is a hard task.
Harder than searching these 50 terms on a single (larger) posting list as
in the copyField case.
With the edismax qf param, sure, hl.fl=* works as it
bq: Is there a way to search the global copyField but highlight the original
stored fields?
That's what I was suggesting. Specify the global field for your search, but
use
hl.fl for fields you want to copy.
And yes, storing the fields is required for highlighting. Consider stemming
(or
worse,
Hi Lee,
You can use :
final DocList docList = rb.getResults().docList;
And if you want to access individual field values, use solrpluginutils' static
method to obtain SolrDocumentList
SolrDocumentList solrDocs = docListToSolrDocumentList(rb.getResults().docList,
req.getSearcher(), fields);
Hi,
I'm trying to get results in a single Solr call through multiple
group.query definitions. I'm getting the results I want but, each group is
presented under a name consisting of the query used for that group.
I'd like to change the name of each group to some meaningful name
instead. I'm
Is there a way to index time or date ranges? That is, assume 2 docs:
#1: date = 2014-01-01
#2: date = 2014-02-01 through 2014-05-01
Would there be a way to index #2's date as a single field and have all the
search options you usually get with time/date?
One strategy could be to index the start
For fancier versions, some people used geo coordinates to represent start
on X axis and stop on Y. Then use perimeter bounds to do overlaps.
There was a discussion on the list about that a while ago.
Regards,
Alex
On 31/07/2014 6:26 am, Ryan Cutter ryancut...@gmail.com wrote:
Is there a
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi Ryan,
On 07/31/2014 01:26 AM, Ryan Cutter wrote:
Is there a way to index time or date ranges? That is, assume 2
docs:
#1: date = 2014-01-01 #2: date = 2014-02-01 through 2014-05-01
Would there be a way to index #2's date as a single field
Thanks for all the replies - I should have made clear that the first
thing I did was confirm that everything on the PHP side is UTF-8. The
web pages, the input text, the input files etc. The browser confirms
that the encoding is UTF-8 for all of the web pages, the response
headers as inspected by
The wiki page on the technique cleans up some small errors from Hoss’s
presentation:
http://wiki.apache.org/solr/SpatialForTimeDurations
But please try Solr trunk which has first-class support for date durations:
https://issues.apache.org/jira/browse/SOLR-6103
Soonish I’ll back-port to 4x.
~
Hi Ahmet,
it’s working :)
Thank you
Chunki.
On Jul 31, 2014, at 7:48 AM, Ahmet Arslan iori...@yahoo.com.INVALID wrote:
Hi Lee,
You can use :
final DocList docList = rb.getResults().docList;
And if you want to access individual field values, use solrpluginutils'
static method to
Hi All,
We have tried both exclude option as well as facet query. Both approach are
not giving us the desired results.
I will explain a little further. I have first level facets - Paperback and
Ebook, and second level facets include a list of languages like English,
French etc..
When user
Hi All,
Currently i am using solr legacy distributed configuration (not solr cloud,
single solr server with multiple shards).
I need to write a query to get one particular document (id specific) from
one shard and all documents from other shards.
Can you please help me to get this query right.
Now it sounds like maybe you have nested facets as opposed to just
different ones. See if one of these fits your use case better:
http://wiki.apache.org/solr/HierarchicalFaceting
Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter:
59 matches
Mail list logo