On 06/15/2018 12:37 PM, Ganesh Pal wrote:
Hey Friedrich,

The proposed solution worked nice , Thank you for  the reply really
appreciate that


Only thing I think would need a review is   if the assignment of the value
of one dictionary to the another dictionary  if is done correctly ( lines
17 to 25 in the below code)


Here is my code :

root@X1:/Play_ground/SPECIAL_TYPES/REGEX# vim Friedrich.py
   1 import re
   2 from collections import OrderedDict
   3
   4 keys = ["struct", "loc", "size", "mirror",
   5         "filename","final_results"]
   6
   7 stats =  OrderedDict.fromkeys(keys)
   8
   9
  10 line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --s    ize=8'
  11
  12 regex = re.compile (r"--(struct|loc|size|mirror|
log_file)\s*=\s*([^\s]+)")
  13 result = dict(re.findall(regex, line))
  14 print result
  15
  16 if result['log_file']:
  17    stats['filename'] = result['log_file']
  18 if result['struct']:
  19    stats['struct'] = result['struct']
  20 if result['size']:
  21    stats['size'] = result['size']
  22 if result['loc']:
  23    stats['loc'] = result['loc']
  24 if result['mirror']:
  25    stats['mirror'] = result['mirror']
  26
  27 print stats
  28
Looks okay to me. If you'd read 'result' using 'get' you wouldn't need to test for the key. 'stats' would then have all keys and value None for keys missing in 'result':

stats['filename'] = result.get ('log_file')
stats['struct']   = result.get ('struct')

This may or may not suit your purpose.

Also, I think  the regex can just be
(r"--(struct|loc|size|mirror|log_file)=([^\s]+)")
no need to match white space character (\s* )  before and after the =
symbol because this would never happen ( this line is actually a key=value
pair of a dictionary getting logged)

You are right. I thought your sample line had a space in one of the groups and didn't reread to verify, letting the false impression take hold. Sorry about that.

Frederic


Regards,
Ganesh






On Fri, Jun 15, 2018 at 12:53 PM, Friedrich Rentsch <
anthra.nor...@bluewin.ch> wrote:

Hi Ganesch. Having proposed a solution to your problem, it would be kind
of you to let me know whether it has helped. In case you missed my
response, I repeat it:

regex = re.compile (r"--(struct|loc|size|mirror|l
og_file)\s*=\s*([^\s]+)")
regex.findall (line)
[('struct', 'data_block'), ('log_file', '/var/1000111/test18.log'),
('loc', '0'), ('mirror', '10')]

Frederic


On 06/13/2018 07:32 PM, Ganesh Pal wrote:

On Wed, Jun 13, 2018 at 5:59 PM, Rhodri James <rho...@kynesim.co.uk>
wrote:

On 13/06/18 09:08, Ganesh Pal wrote:
    Hi Team,
I wanted to parse a file and extract few feilds that are present after
"="
in a text file .


Example , form  the below line I need to extract the values present
after
--struct =, --loc=, --size= and --log_file=

Sample input

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt size=8'

Did you mean "--size=8" at the end?  That's what your explanation
implied.



Yes James you got it right ,  I  meant  "--size=8 " .,


Hi Team,


I played further with python's re.findall()  and  I am able to extract all
the required  fields , I have 2 further questions too , please suggest


Question 1:

   Please let me know  the mistakes in the below code and  suggest if it
can
be optimized further with better regex


# This code has to extract various the fields  from a single line (
assuming the line is matched here ) of a log file that contains various
values (and then store the extracted values in a dictionary )

import re

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'

#loc is an number
r_loc = r"--loc=([0-9]+)"
r_size = r'--size=([0-9]+)'
r_struct = r'--struct=([A-Za-z_]+)'
r_log_file = r'--log_file=([A-Za-z0-9_/.]+)'


if re.findall(r_loc, line):
     print re.findall(r_loc, line)

if re.findall(r_size, line):
     print re.findall(r_size, line)

if re.findall(r_struct, line):
     print re.findall(r_struct, line)

if re.findall(r_log_file, line):
     print re.findall(r_log_file, line)


o/p:
root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py
['0']
['8']
['data_block']
['/var/1000111/test18.log']


Question 2:

I  tried to see if I can use  re.search with look behind assertion , it
seems to work , any comments or suggestions

Example:

import re

line = '06/12/2018 11:13:23 AM python toolname.py  --struct=data_block
--log_file=/var/1000111/test18.log --addr=None --loc=0 --mirror=10
--path=/tmp/data_block.txt --size=8'

match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))', line)
if match:
     print match.group('loc')


o/p: root@X1:/Play_ground/SPECIAL_TYPES/REGEX# python regex_002.py

0


I  want to build  the sub patterns and use match.group() to get the values
, some thing as show below but it doesn't seem to work


match = re.search(r'(?P<loc>(?<=--loc=)([0-9]+))'
                    r'(?P<size>(?<=--size=)([0-9]+))', line)
if match:
     print match.group('loc')
     print match.group('size')

Regards,
Ganesh



--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to