ant is very good at this sort of thing, and easier for Java devs to learn
than Make. Python has a module called fabric that is also very fine, but
for my dev. ops. it is another thing to learn.
I tend to divide things into three categories:
- Things that have to do with system setup, and need
It seems that you have secured Solr so thoroughly that you cannot now run
bin/solr status!
bin/solr has no arguments as yet for providing a username/password - as a
mostly user like you I'm not sure of the roadmap.
I think you should relax those restrictions a bit and try again.
On Fri, Sep 11,
, Sep 12, 2015 at 9:40 AM, Dan Davis <dansm...@gmail.com> wrote:
> It seems that you have secured Solr so thoroughly that you cannot now run
> bin/solr status!
>
> bin/solr has no arguments as yet for providing a username/password - as a
> mostly user like you I'm not sure of the
rk on start/restart.
>
> - Kevin
>
> > On Sep 5, 2015, at 8:45 AM, Dan Davis <dansm...@gmail.com> wrote:
> >
> > Kevin & Noble,
> >
> > I'll take it on to test this. I've built from source before, and I've
> > wanted this authorization capa
=CREATE and so on...
On Thu, Sep 10, 2015 at 11:10 PM, Dan Davis <dansm...@gmail.com> wrote:
> Kevin & Noble,
>
> I've manually verified the fix for SOLR-8000, but not yet for SOLR-8004.
>
> I reproduced the initial problem with reloading security.json after
> restarting
Kevin & Noble,
I'll take it on to test this. I've built from source before, and I've
wanted this authorization capability for awhile.
On Fri, Sep 4, 2015 at 9:59 AM, Kevin Lee wrote:
> Noble,
>
> Does SOLR-8000 need to be re-opened? Has anyone else been able to
Hi Doug, nice write-up and 2 questions:
- You write your own QParser plugins - can one keep the features of edismax
for field boosting/phrase-match boosting by subclassing edismax? Assuming
yes...
- What do pf2 and pf3 do in the edismax query parser?
hon-lucene-synonyms plugin links
Steve,
You gave as an example:
Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�
vollständig�sind
This sentence is probably from the PDF form label content, rather than form
values. Sometimes in PDF, the form's value fields are kept in a separate
file. I'm 99% sure Tika
Steve,
Are you using ExtractingRequestHandler / DataImportHandler or extracting
the text content from the PDF outside of Solr?
On Wed, Apr 22, 2015 at 6:40 AM, steve.sch...@t-systems.com wrote:
Hi guys,
hopefully you can help me with my issue. We are using a solr setup and
have the
+1 - I like Erick's answer. Let me know if that turns out to be the
problem - I'm interested in this problem and would be happy to help.
On Wed, Apr 22, 2015 at 11:11 AM, Erick Erickson erickerick...@gmail.com
wrote:
Are they not _indexed_ correctly or not being displayed correctly?
Take a
Where you want true Role-Based Access Control (RBAC) on each index (core or
collection), one solution is to buy Solr Enterprise from LucidWorks.
My personal practice is mostly dictated by financial decisions:
- Each core/index has its configuration directory in a Git
repository/branch
Sangeetha,
You can also run Tika directly from data import handler, and Data Import
Handler can be made to run several threads if you can partition the input
documents by directory or database id. I've done 4 threads by having a
base configuration that does an Oracle query like this:
But you can potentially still use Solr dedupe if you do the upfront work
(in RDMS or NoSQL pre-index processing) to assign some sort of Group ID.
See OCLC's FRBR Work-Set Algorithm,
http://www.oclc.org/content/dam/research/activities/frbralgorithm/2009-08.pdf?urlm=161376
, for some details on
As an application developer, I have to agree with this direction. I ran
ManifoldCF and Solr together in the same Tomcat, and the sl4j
configurations of the two conflicted with strange results. From a systems
administrator/operations perspective, a separate install allows better
packaging, e.g.
for
docs on different shards.
On Wed, Feb 4, 2015 at 9:06 PM, Dan Davis dansm...@gmail.com wrote:
Doesn't relevancy for that assume that the IDF and TF for user1 and user2
are not too different?SolrCloud still doesn't use a distributed IDF,
correct?
On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum
It looks like you are returning the transformed ID, along with some other
fields, in the deltaQuery command.deltaQuery should only return the ID,
without the stk_ prefix, and then deltaImportQuery should retrieve the
transformed ID. I'd suggest:
entity ...
deltaQuery=SELECT id WHERE
/DataImportHandlerDeltaQueryViaFullImport
Hope this helps,
Dan
On Thu, Feb 5, 2015 at 9:30 PM, Dan Davis dansm...@gmail.com wrote:
It looks like you are returning the transformed ID, along with some other
fields, in the deltaQuery command.deltaQuery should only return the ID,
without the stk_
.
Hope this helps,
Dan Davis
On Wed, Feb 4, 2015 at 3:02 AM, Mikhail Khludnev mkhlud...@griddynamics.com
wrote:
Suresh,
There are a few common workaround for such problem. But, I think that
submitting more than maxIndexingThreads is not really productive. Also, I
think that out-of-memory
handling of both databases and Solr, as does Java with JDBC and SolrJ.
Pushing to Solr probably has more legs than Data Import Handler going
forward.
On Wed, Feb 4, 2015 at 11:13 AM, Dan Davis dansm...@gmail.com wrote:
Suresh and Meena,
I have solved this problem by taking a row count on a query
Doesn't relevancy for that assume that the IDF and TF for user1 and user2
are not too different?SolrCloud still doesn't use a distributed IDF,
correct?
On Wed, Feb 4, 2015 at 7:05 PM, Gili Nachum gilinac...@gmail.com wrote:
Alright. So shard splitting and composite routing plays nicely
Hoss et. al,
I'm not intending on contributing documentation in any immediate sense (the
disclaimer), but I thank you all for the clarification.
It makes some sense to require a committer to review each suggested piece
of official documentation, but I wonder abstractly how a non-committer then
The Data Import Handler isn't pushing data into the /update request
handler. However, Data Import Handler can be extended with transformers.
Two such transformers are the TemplateTransformer and the
ScriptTransformer. It may be possible to get a script function to load
your custom Java code.
, Dan Davis dansm...@gmail.com wrote:
The Data Import Handler isn't pushing data into the /update request
handler. However, Data Import Handler can be extended with transformers.
Two such transformers are the TemplateTransformer and the
ScriptTransformer. It may be possible to get a script
I've been thinking of https://wiki.apache.org/solr/ as the Old Wiki and
https://cwiki.apache.org/confluence/display/solr as the New Wiki.
I guess that's the wrong way to think about it - Confluence is being used
for the Solr Reference Guide, and MoinMoin is being used as a wiki.
Is this the
For this I prefer TemplateTransformer to RegexTransformer - its not a
regex, just a pattern, and so should be more efficient to use
TemplateTransformer. A script will also work, of course.
On Tue, Jan 27, 2015 at 5:54 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
On 27 January 2015 at
Glad it worked out.
On Fri, Jan 23, 2015 at 9:50 PM, Carl Roberts carl.roberts.zap...@gmail.com
wrote:
NVM
I figured this out. The problem was this: pk=link in
rss-dat.config.xml but unique id not link in schema.xml - it is id.
From rss-data-config.xml:
entity name=cve-2002
*pk=link*
I have seen such errors by looking under Logging in the Solr Admin UI.
There is also the LogTransformer for Data Import Handler.
However, it is a design choice in Data Import Handler to skip fields not in
the schema. I would suggest you always use Debug and Verbose to do the
first couple of
I think copying to a new Solr date field is your best bet, because then you
have the flexibility to do date range facets in the future.
If you can re-index, and are using Data Import Handler, Jim Musil's
suggestion is just right.
If you can re-index, and are not using Data Import Handler:
-
Is Jetty actually running on port 80?Do you have Apache2 reverse proxy
in front?
On Mon, Jan 26, 2015 at 11:02 PM, Summer Shire shiresum...@gmail.com
wrote:
Hi All,
Running solr (4.7.2) locally and hitting the admin page like this works
just fine http://localhost:8983/solr/
Cannot get any easier than jquery-ui's autocomplete widget -
http://jqueryui.com/autocomplete/
Basically, you set some classes and implement a javascript that calls the
server to get the autocomplete data. I never would expose Solr to
browsers, so I would have the AJAX call go to a php script
,
- Original Message -
From: Dan Davis dansm...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Monday, January 26, 2015 12:08:13 AM
Subject: [MASSMAIL]Weighting of prominent text in HTML
By examining solr.log, I can see that Nutch is using the /update request
handler rather than
,
Dan Davis, Systems/Applications Architect
National Library of Medicine
the index replicates.
There are a bunch of other reasons to go to SolrCloud, but you know your
problem space best.
FWIW,
Erick
On Sun, Jan 25, 2015 at 9:26 AM, Shawn Heisey apa...@elyograg.org
javascript:; wrote:
On 1/24/2015 10:56 PM, Dan Davis wrote:
When I polled the various projects
, Jan 25, 2015 at 9:26 AM, Shawn Heisey apa...@elyograg.org
javascript:; wrote:
On 1/24/2015 10:56 PM, Dan Davis wrote:
When I polled the various projects already using Solr at my
organization, I
was greatly surprised that none of them were using Solr replication,
because they had
When I polled the various projects already using Solr at my organization, I
was greatly surprised that none of them were using Solr replication,
because they had talked about replicating the data.
But we are not Pinterest, and do not expect to be taking in changes one
post at a time (at least the
Why re-write all the document conversion in Java ;) Tika is very slow. 5
GB PDF is very big.
If you have a lot of PDF like that try pdftotext in HTML and UTF-8 output
mode. The HTML mode captures some meta-data that would otherwise be lost.
If you need to go faster still, you can also
The suggester is not working for me with Solr 4.10.2
Can anyone shed light over why I might be getting the exception below when
I build the dictionary?
response
lst name=responseHeader
int name=status500/int
int name=QTime26/int
/lst
lst name=error
str name=msglen must be = 32767; got 35680/str
how you run it particularly, eg
what you download exactly and what's the command line ?
On Fri, Dec 5, 2014 at 11:37 PM, Dan Davis dansm...@gmail.com wrote:
I have a script transformer and a log transformer, and I'm not seeing the
log messages, at least not where I expect.
Is there anyway I
I am having some trouble getting the suggester to work. The spell
requestHandler is working, but I didn't like the results I was getting from
the word breaking dictionary and turned them off.
So some basic questions:
- How can I check on the status of a dictionary?
- How can I see what is
help here?
See: https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking
Best,
Erick
On Fri, Jan 9, 2015 at 10:35 AM, Dan Davis dansm...@gmail.com wrote:
I have a requirement to spotlight certain results if the query text
exactly
matches the title or see reference
Related question -
I see mention of needing to rebuild the spellcheck/suggest dictionary after
solr core reload. I see spellcheckIndexDir in both the old wiki entry and
the solr reference guide
https://cwiki.apache.org/confluence/display/solr/Spell+Checking. If this
parameter is provided, it
Thanks,
Dan Davis
What about the frequency comparison - I haven't used the spellchecker
heavily, but it seems that if bnak is in the database, but bank is much
more frequent, then bank should be a suggestion anyway...
On Wed, Dec 17, 2014 at 10:41 AM, Erick Erickson erickerick...@gmail.com
wrote:
First, I'd look
I would say that you could determine a row that gives a bad URL, and then
run it in DIH admin interface (or the command-line) with debug enabled
The url parameter going into tika should be present in its transformed form
before the next entity gets going. This works in a similar scenario for
me.
When I have a forEach attribute like the following:
forEach=/medical-topics/medical-topic/health-topic[@language='English']
And then need to match an attribute of that, is there any alternative to
spelling it all out:
field column=url
.
Is there any short-hand for the current node or the match?
On Mon, Dec 8, 2014 at 4:42 PM, Dan Davis dansm...@gmail.com wrote:
When I have a forEach attribute like the following:
forEach=/medical-topics/medical-topic/health-topic[@language='English']
And then need to match an attribute
-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
On 8 December 2014 at 17:01, Dan Davis dansm...@gmail.com wrote:
In experimentation with a much simpler and smaller XML file, it doesn't
look like '//health-topic/@url will not work, nor will '//@url
Yes, that worked quite well. I still need the //tagname but that is the
only DIH incantation I need. This will substantially accelerate things.
On Mon, Dec 8, 2014 at 5:37 PM, Dan Davis d...@danizen.net wrote:
The problem is that XPathEntityProcessor implements Xpath on its own
I have a script transformer and a log transformer, and I'm not seeing the
log messages, at least not where I expect.
Is there anyway I can simply log a custom message from within my script?
Can the script easily interact with its containers logger?
a bit of
tuning and tweaking, but you'll be fine eventually. Document processing
will be the fun part. As you come to scaling the zoo of components, this
will become evident :-)
What is the volume and influx rate in your scenario?
Best regards,
--Jürgen
On 04.11.2014 22:01, Dan Davis wrote:
I'm
All,
The problem here was that I gave driver=BinURLDataSource rather than
type=BinURLDataSource. Of course, saying driver=BinURLDataSource
caused it not to be able to find it.
I'm trying to do research for my organization on the best practices for
open source pipeline/connectors. Since we need Web Crawls, File System
crawls, and Databases, it seems to me that Manifold CF might be the best
case.
Has anyone combined ManifestCF with Solr UpdateRequestProcessors or
://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
On 4 November 2014 16:01, Dan Davis dansm...@gmail.com wrote:
I'm trying to do research for my organization on the best practices for
open source pipeline/connectors. Since we need
I always, always have a web application running that accepts the JavaScript
AJAX call and then forwards it on to the Apache Solr request handler. Even
if you don't control the web application, and can only add JavaScript, you
can put up a API oriented webapp somewhere that only protects Solr for
This seems a little abstract. What I'd do is double check that the SQL is
working correctly by running the stored procedure outside of Solr and see
what you get. You should also be able to look at the corresponding
.properties file and see the inputs used for the delta import. If the data
I had a problem with the ant eclipse answer - it was unable to resolve
javax.activation for the Javadoc. Updating
solr/contrib/dataimporthandler-extras/ivy.xml
as follows did the trick for me:
- dependency org=javax.activation name=activation
rev=${/javax.activation/activation} conf=compile-*/
What I want to do is to pull an URL out of an Oracle database, and then use
TikaEntityProcessor and BinURLDataSource to go fetch and process that
URL. I'm having a problem with this that seems general to JDBC with Tika
- I get an exception as follows:
Exception in entity :
.
-- Forwarded message --
From: Dan Davis d...@danizen.net
Date: 10 October 2014 15:00
Subject: Re: Tika Integration problem with DIH and JDBC
To: Alexandre Rafalovitch arafa...@gmail.com
The definition of dataSource name=bin type=BinURLDataSource is in
each of the dih-*.xml
I don't keep up with this list well enough to know whether anyone else
answered. I don't know how to do it in jetty.xml, but you can certainly
tweak the code. java.net.Socket has a method setTcpNoDelay() that
corresponds with the standard Unix system calls.
Long-time past, my suggestion of
Summary - when constraining a search using filter query, how can I exclude
the constraint for a particular facet?
Detail - Suppose I have the following facet results for a query q=*
mainquery*:
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=foo
int name=A491/int
int
You could copy the existing core to a new core every once in awhile, and
then do your delta indexing into a new core once the copy is complete. If
a Persistent URL for the search results included the name of the original
core, the results you would get from a bookmark would be stable. However,
This could be an operating systems problem rather than a Solr problem.
CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing
and I would read-up up on that.
The VM parameters can be tuned in /etc/sysctl.conf
On Sun, Aug 25, 2013 at 4:23 PM, Furkan KAMACI
On Tue, Aug 27, 2013 at 2:03 AM, Paul Libbrecht p...@hoplahup.net wrote:
Dan,
if you're bound to federated search then I would say that you need to work
on the service guarantees of each of the nodes and, maybe, create
strategies to cope with bad nodes.
paul
+1
I'll think on that.
On Tue, Aug 27, 2013 at 3:33 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
Years ago when Federated Search was a buzzword we did some development
and
testing with Lucene, FAST Search, Google and several other Search Engines
according Federated Search in Library context.
The
On Mon, Aug 26, 2013 at 9:06 PM, Amit Jha shanuu@gmail.com wrote:
Would you like to create something like
http://knimbus.com
I work at the National Library of Medicine. We are moving our library
catalog to a newer platform, and we will probably include articles. The
article's content
magic.
Best
Erick
On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote:
I've thought about it, and I have no time to really do a meta-search
during
evaluation. What I need to do is to create a single core that contains
both of my data sets, and then describe
if you
do not exercise guarantees of remote sources.
Or are the remote cores below actually things that you manage on your
side? If yes guarantees are easy to manage..
Paul
Le 26 août 2013 à 22:38, Dan Davis a écrit :
I have now come to the task of estimating man-days to add Blended
One more question here - is this topic more appropriate to a different list?
On Mon, Aug 26, 2013 at 4:38 PM, Dan Davis dansm...@gmail.com wrote:
I have now come to the task of estimating man-days to add Blended Search
Results to Apache Solr. The argument has been made
be careful with drop_caches - make sure you sync first
On Thu, Aug 22, 2013 at 1:28 PM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedanalytics.com wrote:
I was afraid someone would tell me that... thanks for your input
-Original Message-
From: Toke Eskildsen
Suppose I have two documents with different id, and there is another field,
for instance content-hash which is something like a 16-byte hash of the
content.
Can Solr be configured to return just one copy, and drop the other if both
are relevant?
If Solr does drop one result, do you get any
Ah, but what is the definition of punctuation in Solr?
On Wed, Aug 21, 2013 at 11:15 PM, Jack Krupansky j...@basetechnology.comwrote:
I thought that the StandardTokenizer always split on punctuation,
Proving that you haven't read my book! The section on the standard
tokenizer details the
that makes sense. But I don't know
how you'd just get the right thing to happen with some kind
of scoring magic.
Best
Erick
On Fri, Aug 16, 2013 at 4:07 PM, Dan Davis dansm...@gmail.com wrote:
I've thought about it, and I have no time to really do a meta-search
during
evaluation. What I need
OK - I see that this can be done with Field Collapsing/Grouping. I also
see the mentions in the Wiki for avoiding duplicates using a 16-byte hash.
So, question withdrawn...
On Thu, Aug 22, 2013 at 10:21 PM, Dan Davis dansm...@gmail.com wrote:
Suppose I have two documents with different id
This is an interesting topic - my employer is a medical library and there
are many keywords that may need to be aliased in various ways, and 2 or 3
word phrases that perhaps should be treated specially. Jack, can you give
me an example of how to do that sort of thing?Perhaps I need to buy
I've thought about it, and I have no time to really do a meta-search during
evaluation. What I need to do is to create a single core that contains
both of my data sets, and then describe the architecture that would be
required to do blended results, with liberal estimates.
From the perspective
I am considering enabling a true Federated Search, or meta-search, using
the following basic configuration (this configuration is only for
development and evaluation):
Three Solr cores:
- One to search data I have indexed locally
- One with a custom SearchHandler that is a facade, e.g. it
76 matches
Mail list logo