I realize you want to avoid putting depth details into the field values, but
something has to imply the depth. So with that in mind, here is another
approach (with the assumption that you are chasing down a single branch of a
tree (and all its subbranch offshoots)),
Use dynamic fields
Step
Boon,
I expect you will find many definitions of “proper usage” depending upon
context and expected results. Personally, don’t believe this is Solr’s job to
enforce, and there are many ways through the use of directives in the servlet
container layer that can allow restrictions if you feel
Hakim,
That is what Boost Query (bq=) does.
http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29
Jason
On Jun 2, 2014, at 10:58 AM, Hakim Benoudjit h.benoud...@gmail.com wrote:
Hi guys,
Is it possible in solr to boost documents having a field value (Ex.
field:value)?
I
Marc,
Fundamentally it’s a good solution design to always be capable of reposting
(reindexing) your data to Solr. You are demonstrating a classic use case of
this, which is upgrade. Is there a critical reason why you are avoiding this
step?
Jason
On May 30, 2014, at 10:38 AM, Marc
I’m also not sure I understand the practical purpose of your hard/soft auto
commit settings. You are stating the following:
Every 10 seconds I want data written to disk, but not be searchable.
Every 15 seconds I want data to be written into memory and searchable.
I would consider whether your
I just realized I failed my own reading comprehension :)
You have maxDocs, not maxTime for hard commit. Please disregard.
On May 30, 2014, at 1:46 PM, Jason Hellman jhell...@innoventsolutions.com
wrote:
I’m also not sure I understand the practical purpose of your hard/soft auto
commit
Gregg,
I don’t have an answer to your question but I’m very curious what use case you
have that permits such arbitrary partial-results. Is it just an edge case or
do you want to permit a common occurrence?
Jason
On May 30, 2014, at 3:05 PM, Gregg Donovan gregg...@gmail.com wrote:
I'd like
This. And so much this. As much this as you can muster.
On Apr 7, 2014, at 1:49 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
The speed of ingest via HTTP improves greatly once you do two things:
1. Batch multiple documents into a single request.
2. Index with multiple
Juan,
Pay close attention to the boundary scanner you’re employing:
http://wiki.apache.org/solr/HighlightingParameters#hl.boundaryScanner
You can be explicit to indicate a type (hl.bs.type) with options such as
CHARACTER, WORD, SENTENCE, and LINE. The default is WORD (as the wiki
indicates)
Gregg,
The QueryResultCache caches a sorted int array of results matching the a query.
This should overlap very nicely with your desired behavior, as a hit in this
cache will not perform a Lucene query nor a need to calculate score.
Now, ‘for the life of the Searcher’ is the trick here.
Here’s a rather obvious question: have you rebuilt your spell index recently?
Is it possible the offending numbers snuck into the spell dictionary? The
terms component will show you what’s in your current, searchable field…but not
the dictionary.
If my memory serves correctly, with
Thinking in terms of normalized data in the context of a Lucene index is
dangerous. It is not a relational data model technology, and the join
behaviors available to you have limited use. Each approach requires
compromises that are likely impermissible for certain uses cases.
If it is at
Whether you use the same machines as Solr or separate machines is a matter
suited to taste.
If you are the CTO, then you should make this decision. If not, inform
management that risk conditions are greater when you share function and control
on a single piece of hardware. A single failure
To a very large extent, the capability of a platform is measurable by the skill
of the team administering it.
If core competencies lie in Windows OS then I would wager heavily the platform
will outperform a similar Linux OS installation in the long haul.
All things being equal, it’s really
I second this notion.
My reasoning focuses mostly on maintainability, where I posit that your client
code will be far easier to extend/modify/troubleshoot than any effort spent
attempting to do this within Solr.
Jason
On Dec 23, 2013, at 12:07 PM, Joel Bernstein joels...@gmail.com wrote:
I
David,
I find Mike McCandless’ blog article to be very informative. Give it a go and
let us know if you are still seeking clarification:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
Jason
On Nov 7, 2013, at 5:09 AM, david.dav...@correo.aeat.es wrote:
Hi,
You can, of course, us a function range query:
select?q=text:newsfq={!frange l=0 u=100}sum(x,y)
http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html
This will give you a bit more flexibility to meet your goal.
On Nov 7, 2013, at 7:26 AM, Erik
Nutch is an excellent option. It should feel very comfortable for people
migrating away from the Google appliances.
Apache Droids is another possible way to approach, and I’ve found people using
Heretrix or Manifold for various use cases (and usually in combination with
other use cases where
It is probable that with no addition boost to pf fields that the sum of the
scores will be higher. But it is *possible* that they are not, and adding a
boost to pf gives greater probability that they will be.
All of this bears testing to confirm what search use cases merit what level of
If I sage Otis’ intent here it is to create shards on the basis of intervals of
time. A shard represents a single interval (let’s say a year’s worth of data)
and when that data is no longer necessary it is simply shut down and no longer
included in queries.
So, for example, you could have
Keep in mind that DataStax has a custom update handler, and as such isn't
exactly a vanilla Solr implementation (even though in many ways it still is).
Since updates are co-written to Cassandra and Solr you should always tread a
bit carefully when slightly outside what they perceive to be
If you consider what n-grams do this should make sense to you. Consider the
following piece of data:
White iPod
If the field is fed through a bigram filter (n-gram with size of 2) the
resulting token stream would appear as such:
wh hi it te
ip po od
The usual use of n-grams is to match
The limitations on how many threads you can use to load data is primarily
driven by factors on your hardware: CPU, heap usage, I/O, and the like. It is
common for most index load processes to be able to handle more incoming data on
the Solr side of the equation than can typically be loaded
As an endorsement of Erick's like, the primary benefit I see to processing
through your own code is better error-, exception-, and logging-handling which
is trivial for you to write.
Consider that your code could reside on any server, either receiving through a
PUSH or PULLing the data from
The best use case I see for atomic updates typically involves avoid
transmission of large documents for small field updates. If you are updating a
readCount field of a PDF document that is 1MB in size you will avoid
resending the 1MB PDF document's data in order to increment the readCount
Very specifically, what is the field definition that is being used for the
suggestions?
On Oct 10, 2013, at 5:49 AM, Furkan KAMACI furkankam...@gmail.com wrote:
What is your configuration for auto suggestion?
2013/10/10 ar...@skillnetinc.com ar...@skillnetinc.com
Hi,
We are
The shards.qt parameter is the easiest one to forget, with the most dramatic of
consequences!
On Oct 8, 2013, at 11:10 AM, shamik sham...@gmail.com wrote:
James,
Thanks for your reply. The shards.qt did the trick. I read the
documentation earlier but was not clear on the implementation,
I don't know if there's a way to accomplish your goal directly, but as a pure
workaround, you can write a routine to fetch all the stored values and resubmit
the document without the field in question. This is what atomic updates do,
minus the overhead of the transmission.
On Oct 7, 2013, at
fq=here:there OR this:that
For the lurker: an AND should be:
fq=here:therefq=this:that
While you can, technically, pass:
fq=here:there AND this:that
Solr will cache the separate fq= parameters and reuse them in any context. The
AND(ed) filter will be cached as a single
Utkarsh,
Check to see if the value is actually indexed into the field by using the Terms
request handler:
http://localhost:8983/solr/terms?terms.fl=textterms.prefix=d
(adjust the prefix to whatever you're looking for)
This should get you going in the right direction.
Jason
On Sep 17, 2013,
They have modified the mechanisms for committing documents…Solr in DSE is not
stock Solr...so you are likely encountering a boundary where stock Solr
behavior is not fully supported.
I would definitely reach out to them to find out if they support the request.
On Sep 5, 2013, at 8:27 AM, Ryan,
The circumstance I've most typically seen the index.timestamp show up is when
an update is sent to a slave server. The replication then appears to preserve
the updated slave index in a separate folder while still respecting the correct
data from the master.
On Sep 5, 2013, at 8:03 PM, Shawn
One additional thought here: from a paranoid risk-management perspective it's
not a good idea to have two critical services dependent upon a single point of
failure if the hardware fails. Obviously risk-management is suited to taste,
so you may feel the cost/benefit does not merit the
are not parallel as in the ticket.
-Kevin
On Tue, Aug 13, 2013 at 8:40 PM, Jason Hellman
jhell...@innoventsolutions.com wrote:
While I don't have a past history of this issue to use as reference, if I
were in your shoes I would consider trying your updates with softCommit
disabled. My
It's been my experience that using they convenient feature to change the output
key still doesn't save you from having to map it back to the field name
underlying it in order to trigger the filter query. With that in mind it just
makes more sense to me to leave the effort in the View portion
While I don't have a past history of this issue to use as reference, if I were
in your shoes I would consider trying your updates with softCommit disabled.
My suspicion is you're experiencing some issue with the transaction logging and
how it's managed when your hard commit occurs.
If you can
The majority of the behavior outlined in that wiki page should work quite
sufficiently for 3.5.0. Note that there are only a few items that are marked
Solr4.0 only (DirectSolrSpellChecker and WordBreakSolrSpellChecker, for
example).
On Aug 9, 2013, at 6:26 AM, Kamaljeet Kaur
Or shingles, presuming you want to tokenize and output unigrams.
On Aug 2, 2013, at 11:33 AM, Walter Underwood wun...@wunderwood.org wrote:
Search against a field using edge N-grams. --wunder
On Aug 2, 2013, at 11:16 AM, T. Kuro Kurosaka wrote:
Is there a query parser that supports a
Ben,
This could be constructed as so:
fl=date_depositedfq=date[2013-07-01T00:00:00Z TO
2013-07-31T23:59:00Z]fq=collection_id(1 2 n)q.op=OR
The parenthesis around the 1 2 n set indicate a boolean query, and we're
ensuring they are an OR boolean by the q.op parameter.
This should get you the
Nitin,
You need to ensure the fields you wish to see are marked stored=true in your
schema.xml file, and you should include fields in your fl= parameter
(fl=*,score is a good place to start).
Jason
On Jul 29, 2013, at 8:08 AM, Nitin Agarwal 2nitinagar...@gmail.com wrote:
Hi, I am using Solr
Or use the copyField technique to a single searchable field and set df= to that
field. The example schema does this with the field called text.
On Jul 29, 2013, at 8:35 AM, Ahmet Arslan iori...@yahoo.com wrote:
Hi,
df is a single valued parameter. Only one field can be a default field.
Jonathan,
Please note the openSearcher=false part of your configuration. This is why you
don't see documents. The commits are occurring, and being written to segments
on disk, but they are not visible to the search engine because a Solr searcher
class has not opened them for visibility.
You
Steven,
Some information can be gleaned from the system admin request handler:
http://localhost:8983/solr/admin/system
I am specifically looking at this:
lst name=corestr name=schemaexample/str
Mind you, that is a manually-set value in the schema file. But just in case
you want to get crazy
Also considering using the SweetSpotSimilarityFactory class which allows to to
still engage normalization but control how intrusive it is. This, combined
with the ability to set a custom Similarity class on a per-fieldType basis may
be extremely useful.
More info:
Saqib:
At the simplest level:
1) Source the machine
2) Install Java
3) Install a servlet container of your choice
4) Copy your Solr WAR and conf directories as desired (probably a rough mirror
of your current single server)
5) Start it up and start sending data there
6) Query both by
Kevin,
I can imagine this working if you consider your second data center a pure slave
relationship to your SolrCloud cluster. I haven't tried it, but I don't see
why the solrconfig.xml can't identify as a master allowing you to call any of
your cores in the cluster to replicate out. That
Vinay,
What autoCommit settings do you have for your indexing process?
Jason
On Jun 24, 2013, at 1:28 PM, Vinay Pothnis poth...@gmail.com wrote:
Here is the ulimit -a output:
core file size (blocks, -c) 0 data seg size(kbytes,
-d) unlimited scheduling priority
.
On Mon, Jun 24, 2013 at 1:54 PM, Jason Hellman
jhell...@innoventsolutions.com wrote:
Vinay,
What autoCommit settings do you have for your indexing process?
Jason
On Jun 24, 2013, at 1:28 PM, Vinay Pothnis poth...@gmail.com wrote:
Here is the ulimit -a output:
core file size
in knowing the symptoms of failure, to help
us troubleshoot the underlying problems if and when they arise.
Thanks,
Scott
On Monday, June 24, 2013, Jason Hellman wrote:
Vinay,
You may wish to pay attention to how many transaction logs are being
created along the way to your hard
Shalin,
There's one point to test without caches, which is to establish how much value
a cache actually provides.
For me, this primarily means providing a benchmark by which to decide when to
stop obsessing over caches.
But yes, for load testing I definitely agree :)
Jason
On Jun 21,
And let's not forget the interesting bug in MMapDirectory:
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/store/MMapDirectory.html
NOTE: memory mapping uses up a portion of the virtual memory address space in
your process equal to the size of the file
, Aloke Ghoshal alghos...@gmail.com wrote:
Barani - the fq option doesn't work.
Jason - the dynamic field option won't work due to the high number of
groups and users.
On Wed, Jun 12, 2013 at 1:12 AM, Jason Hellman
jhell...@innoventsolutions.com wrote:
Aloke,
If you do not have
Aloke,
If you do not have a factorial problem in the combination of userid and groupid
(which I can imagine you might) you could consider creating a field for each
combination (u1g1, u2g2) which can easily be done via dynamic fields. Use
CopyField to get data into these various constructs
Roman,
Could you be more specific as to why replication doesn't meet your
requirements? It was geared explicitly for this purpose, including the
automatic discovery of changes to the data on the index master.
Jason
On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote:
OK,
Well, there is a hack(ish) way to do it:
_query_:{!type=edismax qf='someField' v='$q' mm=100%}
This is clearly not a solrconfig.xml settings, but part of your query string
using LocalParam behavior.
This is going to get really messy if you have plenty of fields you'd like to
search, where
Those are default, though autoSoftCommit is commented out by default.
Keep in mind about the hard commit running every 15 seconds: it is not
updating your searchable data (due to the openSearcher=false setting). In
theory, your data should be searchable due to autoSoftCommit running every 1
Jamey,
You will need a load balancer on the front end to direct traffic into one of
your SolrCore entry points. It doesn't matter, technically, which one though
you will find benefits to narrowing traffic to fewer (for purposes of better
cache management).
Internally SolrCloud will
You have mentioned Pivot Facets, but have you looked at the Path Hierarchy
Tokenizer Factory:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory
This matches your use case, as best as I understand it.
Jason
On May 28, 2013, at 12:47 PM, vibhoreng04
You may wish to explore the concept of using the Result Grouping (Field
Collapsing) feature in which your paragraphs are individual documents that
share a field to group them by (the ID of the document/book/article/whatever).
http://wiki.apache.org/solr/FieldCollapsing
This will net you
Sam,
I would highly suggest counting the words in your external pipeline and sending
that value in as a specific field. It can then be queried quite simply with a:
wordcount:{80 TO *]
(Note the { next to 80, excluding the value of 80)
Jason
On May 22, 2013, at 11:37 AM, Sam Lee
And use the /terms request handler to view what is present in the field:
/solr/terms?terms.fl=text_esterms.prefix=a
You're looking to ensure the index does, in fact, have the accented characters
present. It's just a sanity check, but could possibly save you a little
(sanity, that is).
Jason
Most definitely not the number of unique elements in each segment. My 32
document sample index (built from the default example docs data) has the
following:
entry#0:
'StandardDirectoryReader(segments_b:29 _8(4.2.1):C32)'='manu_exact',class
Rishi,
Fantastic! Thank you so very much for sharing the details.
Jason
On May 17, 2013, at 12:29 PM, Rishi Easwaran rishi.easwa...@aol.com wrote:
Hi All,
Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd
share some good news.
I work for AOL mail team and
The first rule of Solr without Unique Key is that we don't talk about Solr
without a Unique Key.
The second rule...
On May 16, 2013, at 8:47 PM, Jack Krupansky j...@basetechnology.com wrote:
Technically, core Solr does not require a unique key. A lot of features in
Solr do require unique
David,
A Pivot Facet could possibly provide these results by the following syntax:
facet.pivot=category,includes
We would presume that includes is a tokenized field and thus a set of facet
values would be rendered from the terms resoling from that tokenization. This
would be nested in each
I have run across plenty of implementations using just about every common
servlet container on the market, and haven't run across any common problems to
dissuade you against any one of them.
On the JVM front most people seem to use Oracle because of it ubiquity. But I
have also run across a
You learned the gosh-darndest things:
http://localhost:8983/solr/browse?q=ipodboost=product(price,-2)debugQuery=on
…nets:
-0.3797992 = (MATCH) sum of:
0.13510442 = (MATCH) max of:
0.045963455 = (MATCH) weight(text:ipod^0.5 in 4) [DefaultSimilarity],
result of:
0.045963455 =
Nicholas,
Also consider that some misspellings are better handled through Synonyms (or
injected metadata).
You can garner a great deal of value out of the spell checker by following the
great advice James is giving here…but you'll find a well-placed helper
synonym or metavalue can often
Milen,
At some point you'll need to call a commit to search your data, either via
AutoCommit policy or deterministically. There are various schools of though on
which way to go but something needs to do this.
If you go the AutoCommit route be sure to pay attention to the openSearcher
(additional cores may be added by the updating
machine depending on the imported data).
Thanks again and best regards!
Milen
-Ursprüngliche Nachricht-
Von: Jason Hellman [mailto:jhell...@innoventsolutions.com]
Gesendet: Freitag, 10. Mai 2013 17:30
An: solr-user@lucene.apache.org
One more tip on the use of filter queries.
DO: fq=name1:value1fq=name2:value2fq=namen:valuen
DON'T: fq=name1:value1 AND name2:value2 AND name3:value3
Where OR operators apply, this does not matter. But your Solr cache will be
much more savvy with the first construct.
Jason
On May 10,
And for 10,000 documents across n shards, that can be significant!
On May 10, 2013, at 11:43 AM, Joel Bernstein joels...@gmail.com wrote:
How many shards are in your collection? The query aggregator node will pull
pack that results from each shard and hold the results in memory. Then it
will
Consider further that term vector data and highlighting becomes very useful if
you highlight externally to Solr. That is to say, you have the data stored
externally and wish to re-parse positions of terms (especially synonyms) from
source material. This is a (not too uncommon) technique used
Luis,
I am presuming you do not have an overarching grouping value here…and simply
wish to show a standard search result that shows 1 item per company.
You should be able to accomplish your second page of desired items (the second
item from each of your 20 represented companies) by using the
From:
http://lucene.apache.org/solr/4_3_0/changes/Changes.html#4.3.0.upgrading_from_solr_4.2.0
Slf4j/logging jars are no longer included in the Solr webapp. All logging jars
are now in example/lib/ext. Changing logging impls is now as easy as updating
the jars in this folder with those
Purely from empirical observation, both the DocumentCache and QueryResultCache
are being populated and reused in reloads of a simple MLT search. You can see
in the cache inserts how much extra-curricular activity is happening to
populate the MLT data by how many inserts and lookups occur on
If you nab the jars in example/lib/ext and place them within the appropriate
folder in Tomcat (and this will somewhat depend on which version of Tomcat you
are using…let's presume tomcat/lib as a brute-force approach) you should be
back in business.
On May 9, 2013, at 11:41 AM, richardg
the results that should be returned for each page of 20 items and
probably make several solr calls per page rendered.
On Thu, May 9, 2013 at 1:07 PM, Jason Hellman
jhell...@innoventsolutions.com wrote:
Luis,
I am presuming you do not have an overarching grouping value here…and
simply
I have to imagine I'm quibbling with the original assertion that Solr 4.x is
architected with a dependency on Zookeeper when I say the following:
Solr 4.x is not architected with a dependency on Zookeeper. SolrCloud,
however, is. As such, if a line of reasoning drives greater concern about
79 matches
Mail list logo