Re: Looping through a series of telephone numbers

2023-04-02 Thread Philippe de Rochambeau
it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from

Re: Looping through a series of telephone numbers

2023-04-02 Thread Philippe de Rochambeau
y that saved my day > in the past when parsing phone numbers in Spark: > > https://github.com/google/libphonenumber > > If you combine it with Bjørn's suggestions you will have a good start on your > linkage task. > > Best regards, > Anastasios Zouzias > &g

Re: Looping through a series of telephone numbers

2023-04-02 Thread Philippe de Rochambeau
ill in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > > On Sat, 1 Apr 2023 at 19:32, Philippe de Rochambeau <mailto:phi...@free.fr>> wrote: >> Hello, >> I’m looking for an efficient way in Spark to sea

Looping through a series of telephone numbers

2023-04-01 Thread Philippe de Rochambeau
Hello, I’m looking for an efficient way in Spark to search for a series of telephone numbers, contained in a CSV file, in a data set column. In pseudo code, for tel in [tel1, tel2, …. tel40,000] search for tel in dataset using .like(« %tel% ») end for I’m using the like function

Re: Log analysis with GraphX

2018-02-10 Thread Philippe de Rochambeau
ther analysis such as random forests or > Markov chains then graphx alone will not help you much. > >> On 10. Feb 2018, at 15:49, Philippe de Rochambeau <phi...@free.fr> wrote: >> >> Hello, >> >> Let’s say a website log is structured as follows: >&g

Log analysis with GraphX

2018-02-10 Thread Philippe de Rochambeau
Hello, Let’s say a website log is structured as follows: ;;; eg. 2018-01-02 12:00:00;OKK;PAG1;1234555 2018-01-02 12:01:01;NEX;PAG1;1234555 2018-01-02 12:00:02;OKK;PAG1;5556667 2018-01-02 12:01:03;NEX;PAG1;5556667 where OKK stands for the OK Button on Page 1, NEX, the Next Button on Page 2, …

Re: Newbie questions regarding log processing

2016-02-22 Thread Philippe de Rochambeau
Thank you to you both, Jorge and Mich. You've answered my questions in a quasi-realtime manner! I will look into Flume and HDFS. > Le 22 févr. 2016 à 22:41, Jorge Machado a écrit : > > To Get the that you could use Flume to ship the logs from the Servers to the > HDFS for

Newbie questions regarding log processing

2016-02-22 Thread Philippe de Rochambeau
Hello, I have a few newbie questions regarding Spark. Is Spark a good tool to process Web logs for attacks (or is it better to used a more specialized tool)? If so, are there any plugins for this purpose? Can you use Spark to weed out huge logs and extract only suspicious activities; e.g., 1000

processing files

2014-11-20 Thread Philippe de Rochambeau
Hello, I need to develop an application which: - reads xml files in thousands of directories, two levels down, from year x to year y - extracts data from image tags in those files and stores them in a Sql or NoSql database - generates ImageMagick commands based on the extracted data to